Data Factory in Fabric (Aug 2025): The upgrades you should actually use


TL;DR

  1. Copy job multi-schedule support (one job, many schedules) → fewer pipelines/triggers to maintain.
  2. Reset incremental copy (safe “rewind” of watermarks) → faster recovery from bad loads/schema drifts without full reloads.
  3. Auto table creation at the destination → cut setup time for new sources/landing zones.
  4. JSON format support in Copy job → simpler semi-structured ingestion (no pre-conversion step)

Related platform update worth noting: On-premises Data Gateway (Aug 2025) adds features that help Fabric pipelines—e.g., Lakehouse connector, data consistency improvements in Copy Activity, and Entra ID support for PostgreSQL.


Why these matter (business value, not just “new buttons”)

1) One Copy job, many schedules

  • Lower operational surface area: fewer assets to govern, secure, and monitor.
  • Cleaner change control: one place to alter cadence when business cycles change.
  • Cost hygiene: avoid “forgotten” timers on duplicate jobs.
  • Resilience: recover from incidents fast; preserve downstream SLOs.
  • Governance: auditable, intentional reprocessing (paired with change tickets).
  • Capacity control: reload only what’s needed.

Use it for: late-arriving transactions, partial source backfills, or when a faulty transformation polluted a slice of Bronze.

3) Auto table creation at the destination

What changed: Copy job can auto-create destination tables (e.g., in a Lakehouse) during first run.
Why you should care:

  • Faster onboarding: new feeds land with minimal setup.
  • Scale with less ceremony: pilot→prod migrations stop getting blocked on “table missing” churn.

Governance tip: keep auto-create on for Bronze only; require modeled schemas (PBIP/TMDL) for Silver/Gold to protect semantics.

4) Native JSON ingestion

What changed: JSON format support in Copy job—land semi-structured data without intermediate conversions.
Why you should care:

  • Broader source coverage: logs, telemetry, partner APIs.
  • Less glue code: fewer notebooks or ad-hoc scripts just to parse JSON.

Platform enablers you shouldn’t overlook (Aug 2025)

  • Gateway release (Aug 2025):
    • Lakehouse connector for Fabric Pipeline (simplifies on-prem → OneLake landing),
    • Data consistency improvements in Copy Activity,
    • Entra ID (Entra ID) support for PostgreSQL (cleaner auth posture).
  • Monthly feature roundup: keep an eye on Fabric’s August feature summary to align roadmap and training.

Where these upgrades fit in a pragmatic Fabric architecture

Reference pattern (governed & cost-sane):

  • Bronze (Raw landing): Copy job → Lakehouse tables
    • Multi-schedule: micro-batch (business hours) + nightly reconciliation
    • JSON support for API/log sources
    • Auto-create tables on first load
  • Silver (Curated): Dataflow Gen2 or Spark notebooks apply schema, PII handling, and SCD logic
  • Gold (BI/Apps): Warehouse/Semantic Models published via CI/CD (PBIP/TMDL); cost controls on refresh windows
  • Resets: Use Reset incremental copy to reprocess specific windows instead of full reloads

“When not to use it” matrix

  • High-volume CDC from OLTP with strict latency? Consider database mirroring or native CDC to Lakehouse first; use Copy job for periodic reconciliations. (Cross-check Fabric Data Factory roadmap & mirroring guidance.)
  • Complex JSON with deep nesting/evolving schema? Land raw, then standardize in notebooks/Dataflow Gen2; still leverage JSON ingestion for the first hop.
  • Strictly modeled enterprise marts (Gold): avoid auto-create; enforce schemas via CI/CD.

Risk & controls

  • Change control: Treat schedule changes and resets as change-managed events (ticket + owner).
  • Data lineage: Ensure lineage from Copy job → Lakehouse → downstream models is visible for impact analysis.
  • Auth posture: Where possible, prefer Entra-based auth for sources (see gateway update).

Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *