Reports¶
The advisor produces reports in three formats. All formats contain the same information — choose whichever fits your workflow.
Report Formats¶
| Format | Method | Best For |
|---|---|---|
| Text | result.text_report |
print() in a notebook cell; quick console overview |
| Markdown | result.markdown_report |
Saving as .md; rendering in GitHub, wikis, documentation |
| HTML | result.html_report |
displayHTML() in Fabric notebooks; rich visual display |
Viewing Reports¶
Text Report¶
The text report is not automatically printed at the end of advisor.run().
To print it:
HTML Report (Recommended)¶
The HTML report renders natively in a Fabric notebook cell:
It includes: - Summary cards (tables analysed, recommendations, etc.) - Per-table sections with sortable score tables - Visual score bars - Color-coded recommendation badges - Collapsible DDL sections - Best practices reference
Markdown Report¶
Saving Reports¶
Use the result.save() method or the standalone save_report() function:
# Via DataClusteringResult
result.save("/lakehouse/default/Files/reports/report.html") # HTML (default)
result.save("/lakehouse/default/Files/reports/report.md", "md") # Markdown
result.save("/lakehouse/default/Files/reports/report.txt", "txt") # Plain text
# Via standalone function
from fabric_warehouse_advisor import save_report
save_report(result.html_report, "/path/to/report.html", format="html")
save_report(result.markdown_report, "/path/to/report.md", format="md")
The format parameter accepts "html", "md", or "txt". When omitted,
it is inferred from the file extension.
For HTML format, if the content doesn't already contain <html> tags,
the save function wraps it in a minimal HTML document with UTF-8 encoding
and a title.
Parent directories are created automatically.
Report Sections¶
All three formats include the following sections:
Executive Summary¶
- Total tables analysed
- Tables with recommendations
- Tables already clustered
- Score threshold used
Per-Table Recommendations¶
For each table the report includes:
- Table name, schema, and row count
- Current
CLUSTER BYcolumns (if any) - Warnings for sub-optimal existing clustering
- Suggested CTAS DDL (when
generate_ctas=True)
Each table also contains a column-level detail table with:
- Column name and data type
- Predicate hits (weighted by query runs)
- Approximate distinct count, cardinality ratio, and percentage
- Cardinality classification (High/Medium/Low)
- Composite score (with visual bar in HTML)
- Recommendation label
- Optimization warnings (if applicable)
All Suggested DDL¶
A consolidated section with every CTAS statement from all tables, plus an explanatory note about how Fabric applies data clustering.
Best Practices¶
A reference section with key recommendations:
- Data clustering is most effective on large tables
- Choose mid-to-high cardinality columns used in WHERE filters
- Batch ingestion (≥ 1M rows per DML) for optimal quality
- Equality JOINs do NOT benefit from data clustering
- Column order in CLUSTER BY doesn't affect row storage
char/varcharfirst 32 character limit for statisticsdecimalprecision > 18 predicate pushdown limitation
Customising Report Output¶
Reports are generated from Python dataclasses (TableRecommendation,
ColumnScore) that you can also access directly:
for rec in result.recommendations:
print(f"Table: {rec.schema_name}.{rec.table_name}")
print(f" Rows: {rec.row_count:,}")
print(f" Current CLUSTER BY: {rec.currently_clustered_columns}")
print(f" Warnings: {rec.warnings}")
for col in rec.recommended_columns:
print(f" - {col.column_name}: score={col.composite_score}, {col.recommendation}")
This allows you to build custom reports, dashboards, or integrations.