Data Clustering Advisor¶
The Data Clustering Advisor helps you answer a deceptively simple question: which columns in my Fabric Warehouse should I cluster?
Data Clustering is one of the most impactful performance levers in Microsoft Fabric Warehouse — it controls how data is physically organized on OneLake, which directly affects query speed and resource consumption. But choosing the right columns to cluster on isn't always obvious. That's where this advisor comes in.
What it does¶
The advisor analyses your actual query patterns (via Query Insights), combines them with table metadata and column cardinality estimates, and scores every candidate column from 0 to 100. You get a clear report telling you exactly what to cluster and why.
Quick Start¶
from fabric_warehouse_advisor import DataClusteringAdvisor, DataClusteringConfig
config = DataClusteringConfig(
warehouse_name="MyWarehouse",
)
advisor = DataClusteringAdvisor(spark, config)
result = advisor.run()
# To experience all features and interactive capabilities, save the report and open it in a web browser
result.save("/lakehouse/default/Files/reports/report.html")
# Rich HTML report
displayHTML(result.html_report)
Warning
Execution time and CU consumption of the Data Clustering Advisor vary based on data volume, and column count. For optimal performance and minimal impact, we recommend running the Advisor during low-usage periods and outside peak concurrency windows.
Working with Results¶
Web Browser is recommended
The best way to visualize the report is to save it as HTML, which provides the full experience with rich features and interactivity.
Exploring Scores¶
Saving Reports¶
result.save("/lakehouse/default/Files/reports/dataclustering_report.html")
result.save("/lakehouse/default/Files/reports/dataclustering_report.md", "md")
result.save("/lakehouse/default/Files/reports/dataclustering_report.txt", "txt")
Persisting data to Delta table¶
result.scores_df.write.mode("overwrite").format("delta").saveAsTable(
"yourschema.data_clustering_advisor_scores"
)
Documentation¶
| Document | Description |
|---|---|
| How It Works | Detailed analysis of each phase in the pipeline execution lifecycle |
| Configuration | Full parameter reference with defaults |
| Scoring | Scoring formula, cardinality penalties, worked examples |
| Reports | HTML, Text and Markdown report formats |
| Data Type Reference | Supported types and limitations |