How It Works¶
The Performance Check advisor runs a multi-phase pipeline (Phases 0–7) that detects the warehouse edition, then scans for caching misconfigurations, V-Order issues, query regressions, data-type anti-patterns, statistics health problems, collation mismatches, and Custom SQL Pools configuration.
Phases are split into two groups: warehouse-level checks (Phases 0–3) run first without scope filtering, then table scope is resolved, and table-scoped checks (Phases 4–6) run on the filtered set. Phase 7 (Custom SQL Pools) uses the Fabric REST API at the workspace level.
Architecture Overview¶
PerformanceCheckAdvisor.run()
|
Warehouse-level checks (no scope filtering):
├─ Phase 0: Edition Detection → DATABASEPROPERTYEX()
├─ Phase 1: Caching → sys.databases, queryinsights
├─ Phase 2: V-Order → sys.databases
├─ Phase 3: Query Regression → queryinsights.exec_requests
│
Scope resolution (schema_names / table_names filtering)
│
Table-scoped checks:
├─ Phase 4: Data Types → INFORMATION_SCHEMA.COLUMNS
├─ Phase 5: Statistics → sys.stats, DBCC SHOW_STATS
├─ Phase 6: Collation → sys.columns, sys.databases
│
Workspace-level check (REST API + T-SQL):
└─ Phase 7: Custom SQL Pools → Fabric REST API, queryinsights
Note
All SQL runs via T-SQL passthrough (no data transferred to
Spark — only metadata and aggregates)
Phase 0: Warehouse Edition Detection¶
Detects whether the connected item is a DataWarehouse or a LakeWarehouse (SQL Analytics Endpoint).
This is the gating check — it determines which subsequent phases are applicable:
| Edition | Caching | V-Order | Query Regression | Data Types | Statistics | Collation | Custom SQL Pools |
|---|---|---|---|---|---|---|---|
| DataWarehouse | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| LakeWarehouse | Yes | Skipped | Yes | Yes | Yes | Yes | Yes |
Phase 1: Caching Analysis¶
Checks result-set caching configuration and analyses cold-start patterns.
Sub-check 1a: Result Cache Status¶
- Caching enabled → INFO (healthy)
- Caching disabled → WARNING with
ALTER DATABASEfix
Sub-check 1b: Cold Start & Cache Hits¶
Queries queryinsights.exec_requests_history over a configurable
lookback window (cold_start_lookback_hours, default: 24h):
- Aggregates
result_cache_hitvalues (0 = not cacheable, 1 = created, 2 = hit) - Computes cache hit ratio → WARNING if below threshold
- Counts queries with
data_scanned_remote_storage_mb > 0(cold starts)
Phase 2: V-Order Check¶
Checks the V-Order write-time optimization state.
- Auto-skipped for LakeWarehouse editions
- V-Order enabled → INFO
- V-Order disabled → CRITICAL (irreversible setting)
V-Order is irreversible
Disabling V-Order cannot be undone. If V-Order is already disabled, the advisor recommends evaluating whether the warehouse is used for staging (acceptable) or analytics (problematic).
Phase 3: Query Regression Detection¶
Compares recent query performance against a historical baseline
using queryinsights.exec_requests_history.
Warehouse-wide
This check runs warehouse-wide and is not filtered by
schema_names / table_names selections.
The 30-day Query Insights retention window is split into two periods:
- Baseline: days 8–30 (configurable via
regression_lookback_days) - Recent: last 7 days (default)
Query shapes are identified by query_hash. Both windows require a
minimum number of executions (regression_min_executions, default: 3)
to filter noise.
| Regression factor | Level |
|---|---|
| ≥ 5× baseline | CRITICAL |
| ≥ 2× baseline | WARNING |
Phase 4: Data Type Analysis¶
Reads INFORMATION_SCHEMA.COLUMNS joined with INFORMATION_SCHEMA.TABLES
to scan every column in user tables for data-type anti-patterns.
This phase uses a single T-SQL query that returns all column metadata, then applies 9 configurable heuristic checks locally:
VARCHAR(MAX)→ CRITICAL- Oversized
VARCHAR(n)→ WARNING CHAR(n)whereVARCHAR(n)is better → WARNINGDECIMAL/NUMERICwith excessive precision → WARNINGFLOAT/REALon monetary-sounding columns → WARNINGBIGINTfor small-range values → INFO- Date/time data stored as strings → WARNING
- Nullable columns that should be
NOT NULL→ INFO
Column names are matched against regex patterns to detect semantic
mismatches (e.g., a FLOAT column named total_amount).
Scope filtering (schema_names, table_names, min_row_count) is
applied before analysis.
Phase 5: Statistics Health¶
The most comprehensive phase, with four sub-checks:
Sub-check 5a: Database Configuration¶
- Auto-create or auto-update disabled → CRITICAL
Sub-check 5b: Proactive Statistics Refresh¶
- Proactive refresh disabled → WARNING
Sub-check 5c: Statistics Staleness & Row Drift¶
Queries sys.stats joined with sys.objects, sys.stats_columns, and
sys.columns for all user tables. For each statistics object:
- Staleness: compares
STATS_DATE()against the configurable threshold (stale_stats_threshold_days, default: 7 days) - Row drift: uses
DBCC SHOW_STATISTICS ... WITH STAT_HEADERto compare the statistics row count estimate against the actual row count obtained viaCOUNT_BIG(*)(which Fabric resolves from columnstore metadata). The earlier approach usingsys.partitionswas replaced because it returnedNULLon Fabric Warehouse. Drift aboverow_drift_pct_threshold(default: 20%) triggers a finding.
Sub-check 5d: Tables Without Statistics¶
Identifies user tables that have no statistics objects at all — typically new tables that haven't been queried yet.
Phase 6: Collation Consistency¶
Checks column-level collation against the database default collation. Mismatched collation can cause implicit conversions in joins and comparisons, preventing predicate push-down.
- All columns match database collation → INFO
- Column collation differs → WARNING
Phase 7: Custom SQL Pools¶
Analyses the Custom SQL Pools configuration for the workspace and monitors pool pressure.
Configuration checks¶
- Feature enabled/disabled — awareness of whether the workspace uses Custom SQL Pools or default autonomous workload management
- Resource allocation sum ≠ 100% — misconfigured pools
- Low resource allocation per pool — risk on capacity SKU downscale
- Single pool dominance — ≥ 90% allocated to one pool
- Empty classifier values — pools with no routing rules
- Read optimization heuristic — suggests enabling for reporting pools (detected by name or classifier patterns)
- Pool count near limit — approaching the 8-pool maximum
Runtime checks¶
- Pool under pressure — queries
queryinsights.sql_pool_insightsforis_pool_under_pressure = 1events over a configurable lookback window - Unclassified traffic — compares
program_namevalues fromqueryinsights.exec_requests_historyagainst pool classifiers to find traffic not routed to any pool - Known Fabric app patterns — detects well-known Fabric application names (Pipelines, Power BI, SQL Query Editor) that are not matched by any classifier
Performance Characteristics¶
| Phase | Method | Data Transfer | Speed |
|---|---|---|---|
| 0. Edition | T-SQL passthrough | ~1 row | Instant |
| 1. Caching | T-SQL passthrough | Aggregated stats (~KB) | Fast |
| 2. V-Order | T-SQL passthrough | ~1 row | Instant |
| 3. Query Regression | T-SQL passthrough | Aggregated medians (~KB) | Fast |
| 4. Data Types | T-SQL passthrough | Column metadata (~KB) | Fast |
| 5. Statistics | T-SQL passthrough + DBCC | Metadata + stat headers | Fast |
| 6. Collation | T-SQL passthrough | Column metadata (~KB) | Fast |
| 7. Custom SQL Pools | REST API + T-SQL passthrough | Config JSON + aggregates (~KB) | Fast |
Info
No user data is ever transferred to Spark — only metadata, counts, and aggregates.