How It Works¶
The Performance Check advisor runs a multi-phase pipeline (Phases 0–6) that detects the warehouse edition, then scans for data-type anti-patterns, caching misconfigurations, V-Order issues, statistics health problems, collation mismatches, and query regressions.
Architecture Overview¶
┌────────────────────────────────────────────────────────────────┐
│ Fabric Notebook │
│ │
│ ├─ Phase 0: Edition Detection → DATABASEPROPERTYEX() │
│ ├─ Phase 1: Data Types → INFORMATION_SCHEMA.COLUMNS │
│ ├─ Phase 2: Caching → sys.databases, queryinsights│
│ ├─ Phase 3: V-Order → sys.databases │
│ ├─ Phase 4: Statistics → sys.stats, DBCC SHOW_STATS │
│ ├─ Phase 5: Collation → sys.columns, sys.databases │
│ └─ Phase 6: Query Regression → queryinsights.exec_requests │
│ │
│ All SQL runs via T-SQL passthrough (no data transferred to │
│ Spark — only metadata and aggregates) │
└────────────────────────────────────────────────────────────────┘
Phase 0: Warehouse Edition Detection¶
Detects whether the connected item is a DataWarehouse or a LakeWarehouse (SQL Analytics Endpoint).
This is the gating check — it determines which subsequent phases are applicable:
| Edition | Data Types | Caching | V-Order | Statistics | Collation | Query Regression |
|---|---|---|---|---|---|---|
| DataWarehouse | Yes | Yes | Yes | Yes | Yes | Yes |
| LakeWarehouse | Yes | Yes | Skipped | Yes | Yes | Yes |
Phase 1: Data Type Analysis¶
Reads INFORMATION_SCHEMA.COLUMNS joined with INFORMATION_SCHEMA.TABLES
to scan every column in user tables for data-type anti-patterns.
This phase uses a single T-SQL query that returns all column metadata, then applies 9 configurable heuristic checks locally:
VARCHAR(MAX)→ CRITICAL- Oversized
VARCHAR(n)→ WARNING CHAR(n)whereVARCHAR(n)is better → WARNINGDECIMAL/NUMERICwith excessive precision → WARNINGFLOAT/REALon monetary-sounding columns → WARNINGBIGINTfor small-range values → INFO- Date/time data stored as strings → WARNING
- Nullable columns that should be
NOT NULL→ INFO
Column names are matched against regex patterns to detect semantic
mismatches (e.g., a FLOAT column named total_amount).
Scope filtering (schema_names, table_names, min_row_count) is
applied before analysis.
Phase 2: Caching Analysis¶
Checks result-set caching configuration and analyses cold-start patterns.
Sub-check 2a: Result Cache Status¶
- Caching enabled → INFO (healthy)
- Caching disabled → WARNING with
ALTER DATABASEfix
Sub-check 2b: Cold Start & Cache Hits¶
Queries queryinsights.exec_requests_history over a configurable
lookback window (cold_start_lookback_hours, default: 24h):
- Aggregates
result_cache_hitvalues (0 = not cacheable, 1 = created, 2 = hit) - Computes cache hit ratio → WARNING if below threshold
- Counts queries with
data_scanned_remote_storage_mb > 0(cold starts)
Phase 3: V-Order Check¶
Checks the V-Order write-time optimization state.
- Auto-skipped for LakeWarehouse editions
- V-Order enabled → INFO
- V-Order disabled → CRITICAL (irreversible setting)
V-Order is irreversible
Disabling V-Order cannot be undone. If V-Order is already disabled, the advisor recommends evaluating whether the warehouse is used for staging (acceptable) or analytics (problematic).
Phase 4: Statistics Health¶
The most comprehensive phase, with four sub-checks:
Sub-check 4a: Database Configuration¶
SELECT name, is_auto_create_stats_on, is_auto_update_stats_on
FROM sys.databases WHERE database_id = DB_ID()
- Auto-create or auto-update disabled → CRITICAL
Sub-check 4b: Proactive Statistics Refresh¶
- Proactive refresh disabled → WARNING
Sub-check 4c: Statistics Staleness & Row Drift¶
Queries sys.stats joined with sys.objects, sys.stats_columns, and
sys.columns for all user tables. For each statistics object:
- Staleness: compares
STATS_DATE()against the configurable threshold (stale_stats_threshold_days, default: 7 days) - Row drift: uses
DBCC SHOW_STATISTICS ... WITH STAT_HEADERto compare the statistics row count estimate against the actual row count obtained viaCOUNT_BIG(*)(which Fabric resolves from columnstore metadata). The earlier approach usingsys.partitionswas replaced because it returnedNULLon Fabric Warehouse. Drift aboverow_drift_pct_threshold(default: 20%) triggers a finding.
Sub-check 4d: Tables Without Statistics¶
Identifies user tables that have no statistics objects at all — typically new tables that haven't been queried yet.
Phase 5: Collation Consistency¶
Checks column-level collation against the database default collation. Mismatched collation can cause implicit conversions in joins and comparisons, preventing predicate push-down.
- All columns match database collation → INFO
- Column collation differs → WARNING
Phase 6: Query Regression Detection¶
Compares recent query performance against a historical baseline
using queryinsights.exec_requests_history.
Warehouse-wide
This check runs warehouse-wide and is not filtered by
schema_names / table_names selections.
The 30-day Query Insights retention window is split into two periods:
- Baseline: days 8–30 (configurable via
regression_lookback_days) - Recent: last 7 days (default)
Query shapes are identified by query_hash. Both windows require a
minimum number of executions (regression_min_executions, default: 3)
to filter noise.
| Regression factor | Level |
|---|---|
| ≥ 5× baseline | CRITICAL |
| ≥ 2× baseline | WARNING |
Data Flow¶
Phase 0 (edition) ──────────────── gates Phase 3
Phase 1 (data types) ──────────┐
Phase 2 (caching) ─────────────┤
Phase 3 (V-Order) ─────────────┤
Phase 4 (statistics) ──────────┼──► Report Generation
Phase 5 (collation) ───────────┤ │
Phase 6 (query regression) ────┘ ▼
PerformanceCheckResult
├── findings[]
├── summary (CheckSummary)
├── text_report
├── markdown_report
└── html_report
Performance Characteristics¶
| Phase | Method | Data Transfer | Speed |
|---|---|---|---|
| 0. Edition | T-SQL passthrough | ~1 row | Instant |
| 1. Data Types | T-SQL passthrough | Column metadata (~KB) | Fast |
| 2. Caching | T-SQL passthrough | Aggregated stats (~KB) | Fast |
| 3. V-Order | T-SQL passthrough | ~1 row | Instant |
| 4. Statistics | T-SQL passthrough + DBCC | Metadata + stat headers | Fast |
| 5. Collation | T-SQL passthrough | Column metadata (~KB) | Fast |
| 6. Query Regression | T-SQL passthrough | Aggregated medians (~KB) | Fast |
No user data is ever transferred to Spark — only metadata, counts, and aggregates.