TL;DR
August brings four Data Factory upgrades that remove orchestration friction and lower time-to-value for governed ingestion in Fabric:
- Copy job multi-schedule support (one job, many schedules) → fewer pipelines/triggers to maintain.
- Reset incremental copy (safe “rewind” of watermarks) → faster recovery from bad loads/schema drifts without full reloads.
- Auto table creation at the destination → cut setup time for new sources/landing zones.
- JSON format support in Copy job → simpler semi-structured ingestion (no pre-conversion step)
Related platform update worth noting: On-premises Data Gateway (Aug 2025) adds features that help Fabric pipelines—e.g., Lakehouse connector, data consistency improvements in Copy Activity, and Entra ID support for PostgreSQL.
Why these matter (business value, not just “new buttons”)
1) One Copy job, many schedules
What changed: You can configure multiple schedules inside a single Copy job (e.g., 15-min micro-batches 9am–6pm + a nightly reconciliation). Previously you’d clone jobs or add pipeline logic.
Why you should care:
- Lower operational surface area: fewer assets to govern, secure, and monitor.
- Cleaner change control: one place to alter cadence when business cycles change.
- Cost hygiene: avoid “forgotten” timers on duplicate jobs.
Use it for: sales/ops systems with daytime SLAs + nightly catch-ups; marketing extracts that spike during campaigns.
2) Reset incremental copy (rewind without re-platforming)
What changed: Copy job can reset its incremental watermark so you can re-ingest from a known point (e.g., after a source fix or schema drift) without a total truncate-and-reload.
Why you should care:
- Resilience: recover from incidents fast; preserve downstream SLOs.
- Governance: auditable, intentional reprocessing (paired with change tickets).
- Capacity control: reload only what’s needed.
Use it for: late-arriving transactions, partial source backfills, or when a faulty transformation polluted a slice of Bronze.
3) Auto table creation at the destination
What changed: Copy job can auto-create destination tables (e.g., in a Lakehouse) during first run.
Why you should care:
- Faster onboarding: new feeds land with minimal setup.
- Scale with less ceremony: pilot→prod migrations stop getting blocked on “table missing” churn.
Governance tip: keep auto-create on for Bronze only; require modeled schemas (PBIP/TMDL) for Silver/Gold to protect semantics.
4) Native JSON ingestion
What changed: JSON format support in Copy job—land semi-structured data without intermediate conversions.
Why you should care:
- Broader source coverage: logs, telemetry, partner APIs.
- Less glue code: fewer notebooks or ad-hoc scripts just to parse JSON.
Platform enablers you shouldn’t overlook (Aug 2025)
- Gateway release (Aug 2025):
- Lakehouse connector for Fabric Pipeline (simplifies on-prem → OneLake landing),
- Data consistency improvements in Copy Activity,
- Entra ID (Entra ID) support for PostgreSQL (cleaner auth posture).
- Monthly feature roundup: keep an eye on Fabric’s August feature summary to align roadmap and training.
Where these upgrades fit in a pragmatic Fabric architecture
Reference pattern (governed & cost-sane):
- Bronze (Raw landing): Copy job → Lakehouse tables
- Multi-schedule: micro-batch (business hours) + nightly reconciliation
- JSON support for API/log sources
- Auto-create tables on first load
- Silver (Curated): Dataflow Gen2 or Spark notebooks apply schema, PII handling, and SCD logic
- Gold (BI/Apps): Warehouse/Semantic Models published via CI/CD (PBIP/TMDL); cost controls on refresh windows
- Resets: Use Reset incremental copy to reprocess specific windows instead of full reloads
“When not to use it” matrix
- High-volume CDC from OLTP with strict latency? Consider database mirroring or native CDC to Lakehouse first; use Copy job for periodic reconciliations. (Cross-check Fabric Data Factory roadmap & mirroring guidance.)
- Complex JSON with deep nesting/evolving schema? Land raw, then standardize in notebooks/Dataflow Gen2; still leverage JSON ingestion for the first hop.
- Strictly modeled enterprise marts (Gold): avoid auto-create; enforce schemas via CI/CD.
Risk & controls
- Change control: Treat schedule changes and resets as change-managed events (ticket + owner).
- Data lineage: Ensure lineage from Copy job → Lakehouse → downstream models is visible for impact analysis.
- Auth posture: Where possible, prefer Entra-based auth for sources (see gateway update).
Conclusion
The August Data Factory updates in Microsoft Fabric aren’t just technical conveniences—they’re operational accelerators. Multi-scheduling simplifies orchestration, reset incremental copy builds resilience into pipelines, auto table creation speeds up onboarding, and JSON support extends your reach into semi-structured data. For CIOs and data leaders, these features mean fewer moving parts to govern, faster recovery from incidents, and quicker time-to-insight across business-critical systems.
Leave a Reply