Data Clustering Advisor¶
The Data Clustering Advisor helps you answer a deceptively simple question: which columns in my Fabric Warehouse should I cluster?
Data Clustering is one of the most impactful performance levers in Microsoft Fabric Warehouse — it controls how data is physically organized on OneLake, which directly affects query speed and resource consumption. But choosing the right columns to cluster on isn't always obvious. That's where this advisor comes in.
What it does¶
The advisor analyses your actual query patterns (via Query Insights), combines them with table metadata and column cardinality estimates, and scores every candidate column from 0 to 100. You get a clear report telling you exactly what to cluster and why.
Quick Start¶
from fabric_warehouse_advisor import DataClusteringAdvisor, DataClusteringConfig
config = DataClusteringConfig(
warehouse_name="MyWarehouse",
)
advisor = DataClusteringAdvisor(spark, config)
result = advisor.run()
# Rich HTML report — best way to view results in a Fabric notebook
displayHTML(result.html_report)
Working with Results¶
Exploring Scores¶
Programmatic Access¶
for rec in result.recommendations:
print(f"{rec.schema_name}.{rec.table_name}: {rec.cluster_by_ddl}")
for col in rec.recommended_columns:
print(f" {col.column_name} — score {col.composite_score}, {col.recommendation}")
Saving Reports¶
result.save("/lakehouse/default/Files/reports/clustering_report.html")
result.save("/lakehouse/default/Files/reports/clustering_report.md", "md")
result.save("/lakehouse/default/Files/reports/clustering_report.txt", "txt")
Persisting Scores to Delta¶
result.scores_df.write.mode("overwrite").format("delta").saveAsTable(
"yourschema.clustering_advisor_scores"
)
Documentation¶
| Document | Description |
|---|---|
| How It Works | Deep dive into the 7-phase pipeline |
| Configuration | Full parameter reference with defaults |
| Scoring | Scoring formula, cardinality penalties, worked examples |
| Reports | Text, Markdown, and HTML report formats |
| Data Type Reference | Supported types and limitations |