The promise of zero-ETL is tantalizing: connect your live systems to Microsoft Fabric’s OneLake, and downstream analytics and models just see fresh data without you having to build and maintain tedious pipelines. In early proofs of concept, that idea often works well. But when teams scale toward full pilots or production, hidden limits and schema fragilities tend to show up — and slowly erode the promise.
In my years working as a data engineer deeply involved with Fabric deployments, I’ve seen that it’s not failure of vision or technology that breaks pilots — it’s failure to anticipate the fine print. This article walks through where mirroring tends to crack, what to watch for, and how to frame realistic expectations.
What Mirroring Actually Gives You — and What It Doesn’t
Before diving into edge cases, here’s a clear, grounded picture of what Fabric mirroring offers — and where assumptions often overstep reality.
What it does:
- Captures changes (inserts, updates, deletes) from a source database and lands them in Fabric (OneLake) in delta / parquet / internal formats so that your SQL endpoints, semantic models, or Spark workloads can access fresh data.
- Hides much of the plumbing: change tracking, incremental ingestion, schema mapping, merging deltas.
- Lets you offload read and analytics workloads to Fabric, reducing load on your source systems.
What it does not do:
- It is not two-way — you can’t send writes or DDL changes back to the source.
- It does not guarantee perfect schema elasticity for all possible changes.
- It doesn’t automatically preserve or replicate business logic embedded in stored procedures, triggers, or other side effects.
- It doesn’t eliminate all complexity — you still need to think about governance, latency, schema drift, and the cost impacts of change.
In short: mirroring is a powerful tool, but not a magic wand.
The Fine Print That Trips Up Pilots
Below are the common, under-documented constraints and “gotchas” that tend to surface when moving beyond toy datasets.
Table-count and scale caps
- Snowflake → 500 tables limit: When mirroring from Snowflake, only up to 500 tables in a database can be mirrored. Any beyond that are ignored.
- SQL database “preview” mode → 1,000 tables: For SQL sources, the current preview supports mirroring up to 1,000 tables.
- Many teams push “mirror everything,” believing the system will scale, only to find entire subsets of tables missing — or unexplained gaps in data lineage.
Schema evolution and DDL fragility
- Renaming schema or table in sources (or in inclusion/exclusion config) is poorly supported in some paths (e.g. Databricks)
- Adding, dropping, or altering columns can break downstream models or cause schema mismatches.
- Some unsupported constructs (JSON fields, wildcard property names, whitespace in property keys) are disallowed in certain mirrors (e.g. Cosmos DB)
- You can’t rely on downstream models to survive arbitrary schema drift — they will likely break or require frequent fixes.
Source-specific and platform limitations
- SQL Server: Only mirroring from a primary database in an availability group is allowed.
- Cross-tenant limitations: If your Azure SQL Database is in a different Entra tenant than the Fabric workspace, mirroring currently isn’t supported.
- Databricks mirroring: Schema renames are not supported when added to the inclusion/exclusion list. That means if your warehouse reorganizes or renames schemas/tables, the mirror may not adjust.
Performance, latency, and “catch-up debt”
- High-volume change bursts can cause backlog in the mirror engine, meaning “near real time” begins to degrade under load.
- As schema drift accumulates, incremental synchronization becomes less efficient.
- The internal delta / merge process can consume compute and storage unexpectedly during heavy churn periods — budget for ebbs and spikes.
Governance and lineage surprises
- Mirrored datasets often appear as new artifacts in Fabric rather than inheriting explicit lineage from the source. This disconnect can confuse users and complicate data cataloging efforts.
- If users see new datasets “floating” in Fabric without clear ownership, trust and adoption suffer.
How to Frame Expectations for Stakeholders
One of the most damaging mistakes is overselling mirroring as a “plug-and-play, zero-ETL miracle.” Instead, calibrate expectations early by communicating these realities:
- Mirroring is powerful — but fragile. It works best when your source schema is fairly stable and under control.
- Not all tables will survive the jump. Some will need manual reconciliation, others might be excluded automatically.
- You will face schema maintenance overhead. You’ll need processes (or tools) to manage drift, mapping, and remapping.
- Latency is not magical. Mirroring helps reduce the freshness gap, but under heavy change volumes it will lag.
- Governance must be augmented, not assumed. Because lineage isn’t always clean, you’ll want catalog-level mapping and operational discipline.
A Pilot-Readiness Checklist
| Focus | Question | Flag to Watch |
|---|---|---|
| Schema Volatility | Does the source undergo frequent DDL changes (weekly/monthly)? | Frequent schema changes = fragility. |
| Table Scope | Are you mirroring only the tables you truly need for your pilot analytics? | Mirror subset, not full DB. |
| Type Compatibility | Do your table types (JSON, VARIANT, arrays) map cleanly into delta/parquet? | Avoid exotic types in pilot phase. |
| Business Logic Footprint | Any logic in triggers or stored procedures you depend on? | Mirror won’t capture them — you’ll need to recreate. |
| Latency Budget | What’s your acceptable freshness window? | If you need <1 sec, mirroring may not suffice under load. |
| Governance Plan | How will lineage be tracked and communicated? | Expect manual curation or catalog mapping. |
| Fallback | Plan a “resync” or mirror restart path in case of breakages | Be ready to reinitialize parts if drift breaks things. |
Summary & Forward View
Microsoft Fabric’s mirroring is a highly compelling feature — it lowers the friction of connecting operational systems to analytics without building full ETL pipelines. But in practice, the fine print matters. Table count limits, schema drift sensitivity, source restrictions, and lineage opacity are exactly what derail promising pilots.
The difference between a successful pilot and a pile of broken dashboards often lies in preparation: knowing which tables to mirror, assessing schema stability, and having upstream plans for handling changes gracefully.
If I were advising a team launching a pilot today, I’d say: start small. Mirror just the critical subset of tables. Lock down your schema for the pilot period. Build a small transformation layer to isolate schema changes. Document lineage rigorously. And instrument latency and breakage detection from day one.
At the end of the day, mirroring is a powerful lever — but you don’t want that lever snapping halfway through your roll-out.
Leave a Reply