Skip to content

Reports

The advisor produces reports in three formats. All formats contain the same information — choose whichever fits your workflow.

Report Formats

Format Method Best For
Text result.text_report print() in a notebook cell; quick console overview
Markdown result.markdown_report Saving as .md; rendering in GitHub, wikis, documentation
HTML result.html_report Saving as .html; Use a Web Browser for rich visual display or the displayHTML() in Fabric notebooks

Web Browser is recommended

The best way to visualize the report is to save it as HTML, which provides the full experience with rich features and interactivity.

Viewing Reports

The HTML report renders natively in a Fabric notebook cell:

displayHTML(result.html_report)

It includes: - Sidebar with tab-based navigation across tables, DDL, and best practices - Summary cards (tables analysed, recommendations, etc.) - Per-table sections with sortable score tables - Column header tooltips explaining each metric - Visual score bars - Color-coded recommendation badges - Collapsible DDL sections - Best practices reference

Markdown Report

print(result.markdown_report)

Text Report

The text report is not automatically printed at the end of advisor.run(). To print it:

print(result.text_report)

Saving Reports

Use the result.save() method or the standalone save_report() function:

# Via DataClusteringResult
result.save("/lakehouse/default/Files/reports/report.html")           # HTML (default)
result.save("/lakehouse/default/Files/reports/report.md", "md")       # Markdown
result.save("/lakehouse/default/Files/reports/report.txt", "txt")     # Plain text

# Via standalone function
from fabric_warehouse_advisor import save_report

save_report(result.html_report, "/path/to/report.html", format="html")
save_report(result.markdown_report, "/path/to/report.md", format="md")

The format parameter accepts "html", "md", or "txt". When omitted, it is inferred from the file extension.

For HTML format, if the content doesn't already contain <html> tags, the save function wraps it in a minimal HTML document with UTF-8 encoding and a title.

Parent directories are created automatically.

Report Sections

All three formats include the following sections:

Executive Summary

  • Total tables analysed
  • Tables with recommendations
  • Tables already clustered
  • Score threshold used

Per-Table Recommendations

For each table the report includes:

  • Table name, schema, and row count
  • Current CLUSTER BY columns (if any)
  • Warnings for sub-optimal existing clustering
  • Suggested CTAS DDL (when generate_ctas=True)

Each table also contains a column-level detail table with:

  • Column name and data type
  • Predicate hits (weighted by query runs)
  • Approximate distinct count, cardinality ratio, and percentage
  • Cardinality classification (High/Medium/Low)
  • Composite score (with visual bar in HTML)
  • Recommendation label
  • Optimization warnings (if applicable)

All Suggested DDL

A consolidated section with every CTAS statement from all tables, plus an explanatory note about how Fabric applies data clustering.

Best Practices

A reference section with key recommendations:

  • Data clustering is most effective on large tables
  • Choose mid-to-high cardinality columns used in WHERE filters
  • Batch ingestion (≥ 1M rows per DML) for optimal quality
  • Equality JOINs do NOT benefit from data clustering
  • Column order in CLUSTER BY doesn't affect row storage
  • char/varchar first 32 character limit for statistics
  • decimal precision > 18 predicate pushdown limitation

Customizing Report Output

The DataClusteringResult object exposes several attributes beyond the pre-formatted reports:

Attribute Type Description
recommendations list[TableRecommendation] Per-table recommendations with nested ColumnScore objects.
all_scores list[ColumnScore] Flat list of every scored column across all tables.
scores_df DataFrame Spark DataFrame with the detailed scores — useful for custom queries, joins, or saving to a Lakehouse table.
captured_at str ISO-8601 UTC timestamp of when the advisor run completed.

You can work with the Spark DataFrame for further analysis:

# Show top candidates
display(result.scores_df.orderBy("composite_score", ascending=False))

# Save scores to a Lakehouse table
result.scores_df.write.mode("overwrite").saveAsTable("data_clustering_scores")

This allows you to build custom reports, dashboards, or integrations.