How It Works¶

The Performance Check advisor runs a multi-phase pipeline (Phases 0–7) that detects the warehouse edition, then scans for caching misconfigurations, V-Order issues, query regressions, data-type anti-patterns, statistics health problems, collation mismatches, and Custom SQL Pools configuration.

Phases are split into two groups: warehouse-level checks (Phases 0–3) run first without scope filtering, then table scope is resolved, and table-scoped checks (Phases 4–6) run on the filtered set. Phase 7 (Custom SQL Pools) uses the Fabric REST API at the workspace level.

Architecture Overview¶

PerformanceCheckAdvisor.run()
  |
  Warehouse-level checks (no scope filtering):                      
  ├─ Phase 0: Edition Detection   → DATABASEPROPERTYEX()            
  ├─ Phase 1: Caching             → sys.databases, queryinsights    
  ├─ Phase 2: V-Order             → sys.databases                   
  ├─ Phase 3: Query Regression    → queryinsights.exec_requests     
  │                                                                 
  Scope resolution  (schema_names / table_names filtering)          
  │                                                                 
  Table-scoped checks:                                              
  ├─ Phase 4: Data Types          → INFORMATION_SCHEMA.COLUMNS      
  ├─ Phase 5: Statistics          → sys.stats, DBCC SHOW_STATS      
  ├─ Phase 6: Collation           → sys.columns, sys.databases      
  │                                                                 
  Workspace-level check (REST API + T-SQL):                         
  └─ Phase 7: Custom SQL Pools    → Fabric REST API, queryinsights

Note

All SQL runs via T-SQL passthrough (no data transferred to
Spark — only metadata and aggregates)

Phase 0: Warehouse Edition Detection¶

Detects whether the connected item is a DataWarehouse or a LakeWarehouse (SQL Analytics Endpoint).

This is the gating check — it determines which subsequent phases are applicable:

Edition	Caching	V-Order	Query Regression	Data Types	Statistics	Collation	Custom SQL Pools
DataWarehouse	Yes	Yes	Yes	Yes	Yes	Yes	Yes
LakeWarehouse	Yes	Skipped	Yes	Yes	Yes	Yes	Yes

Phase 1: Caching Analysis¶

Checks result-set caching configuration and analyses cold-start patterns.

Sub-check 1a: Result Cache Status¶

Caching enabled → INFO (healthy)
Caching disabled → WARNING with ALTER DATABASE fix

Sub-check 1b: Cold Start & Cache Hits¶

Queries queryinsights.exec_requests_history over a configurable lookback window (cold_start_lookback_hours, default: 24h):

Aggregates result_cache_hit values (0 = not cacheable, 1 = created, 2 = hit)
Computes cache hit ratio → WARNING if below threshold
Counts queries with data_scanned_remote_storage_mb > 0 (cold starts)

Phase 2: V-Order Check¶

Checks the V-Order write-time optimization state.

Auto-skipped for LakeWarehouse editions
V-Order enabled → INFO
V-Order disabled → CRITICAL (irreversible setting)

V-Order is irreversible

Disabling V-Order cannot be undone. If V-Order is already disabled, the advisor recommends evaluating whether the warehouse is used for staging (acceptable) or analytics (problematic).

Phase 3: Query Regression Detection¶

Compares recent query performance against a historical baseline using queryinsights.exec_requests_history.

Warehouse-wide

This check runs warehouse-wide and is not filtered by schema_names / table_names selections.

The 30-day Query Insights retention window is split into two periods:

Baseline: days 8–30 (configurable via regression_lookback_days)
Recent: last 7 days (default)

Query shapes are identified by query_hash. Both windows require a minimum number of executions (regression_min_executions, default: 3) to filter noise.

Regression factor	Level
≥ 5× baseline	CRITICAL
≥ 2× baseline	WARNING

Phase 4: Data Type Analysis¶

Reads INFORMATION_SCHEMA.COLUMNS joined with INFORMATION_SCHEMA.TABLES to scan every column in user tables for data-type anti-patterns.

This phase uses a single T-SQL query that returns all column metadata, then applies 9 configurable heuristic checks locally:

VARCHAR(MAX) → CRITICAL
Oversized VARCHAR(n) → WARNING
CHAR(n) where VARCHAR(n) is better → WARNING
DECIMAL / NUMERIC with excessive precision → WARNING
FLOAT / REAL on monetary-sounding columns → WARNING
BIGINT for small-range values → INFO
Date/time data stored as strings → WARNING
Nullable columns that should be NOT NULL → INFO

Column names are matched against regex patterns to detect semantic mismatches (e.g., a FLOAT column named total_amount).

Scope filtering (schema_names, table_names, min_row_count) is applied before analysis.

Phase 5: Statistics Health¶

The most comprehensive phase, with four sub-checks:

Sub-check 5a: Database Configuration¶

Auto-create or auto-update disabled → CRITICAL

Sub-check 5b: Proactive Statistics Refresh¶

Proactive refresh disabled → WARNING

Sub-check 5c: Statistics Staleness & Row Drift¶

Queries sys.stats joined with sys.objects, sys.stats_columns, and sys.columns for all user tables. For each statistics object:

Staleness: compares STATS_DATE() against the configurable threshold (stale_stats_threshold_days, default: 7 days)
Row drift: uses DBCC SHOW_STATISTICS ... WITH STAT_HEADER to compare the statistics row count estimate against the actual row count obtained via COUNT_BIG(*) (which Fabric resolves from columnstore metadata). The earlier approach using sys.partitions was replaced because it returned NULL on Fabric Warehouse. Drift above row_drift_pct_threshold (default: 20%) triggers a finding.

Sub-check 5d: Tables Without Statistics¶

Identifies user tables that have no statistics objects at all — typically new tables that haven't been queried yet.

Phase 6: Collation Consistency¶

Checks column-level collation against the database default collation. Mismatched collation can cause implicit conversions in joins and comparisons, preventing predicate push-down.

All columns match database collation → INFO
Column collation differs → WARNING

Phase 7: Custom SQL Pools¶

Analyses the Custom SQL Pools configuration for the workspace and monitors pool pressure.

Configuration checks¶

Feature enabled/disabled — awareness of whether the workspace uses Custom SQL Pools or default autonomous workload management
Resource allocation sum ≠ 100% — misconfigured pools
Low resource allocation per pool — risk on capacity SKU downscale
Single pool dominance — ≥ 90% allocated to one pool
Empty classifier values — pools with no routing rules
Read optimization heuristic — suggests enabling for reporting pools (detected by name or classifier patterns)
Pool count near limit — approaching the 8-pool maximum

Runtime checks¶

Pool under pressure — queries queryinsights.sql_pool_insights for is_pool_under_pressure = 1 events over a configurable lookback window
Unclassified traffic — compares program_name values from queryinsights.exec_requests_history against pool classifiers to find traffic not routed to any pool
Known Fabric app patterns — detects well-known Fabric application names (Pipelines, Power BI, SQL Query Editor) that are not matched by any classifier

Performance Characteristics¶

Phase	Method	Data Transfer	Speed
0. Edition	T-SQL passthrough	~1 row	Instant
1. Caching	T-SQL passthrough	Aggregated stats (~KB)	Fast
2. V-Order	T-SQL passthrough	~1 row	Instant
3. Query Regression	T-SQL passthrough	Aggregated medians (~KB)	Fast
4. Data Types	T-SQL passthrough	Column metadata (~KB)	Fast
5. Statistics	T-SQL passthrough + DBCC	Metadata + stat headers	Fast
6. Collation	T-SQL passthrough	Column metadata (~KB)	Fast
7. Custom SQL Pools	REST API + T-SQL passthrough	Config JSON + aggregates (~KB)	Fast

Info

No user data is ever transferred to Spark — only metadata, counts, and aggregates.