Skip to content

Reports

The advisor produces reports in three formats. All formats contain the same information — choose whichever fits your workflow.

Report Formats

Format Method Best For
Text result.text_report print() in a notebook cell; quick console overview
Markdown result.markdown_report Saving as .md; rendering in GitHub, wikis, documentation
HTML result.html_report displayHTML() in Fabric notebooks; rich visual display

Viewing Reports

Text Report

The text report is not automatically printed at the end of advisor.run(). To print it:

print(result.text_report)

The HTML report renders natively in a Fabric notebook cell:

displayHTML(result.html_report)

It includes: - Summary cards (tables analysed, recommendations, etc.) - Per-table sections with sortable score tables - Visual score bars - Color-coded recommendation badges - Collapsible DDL sections - Best practices reference

Markdown Report

print(result.markdown_report)

Saving Reports

Use the result.save() method or the standalone save_report() function:

# Via DataClusteringResult
result.save("/lakehouse/default/Files/reports/report.html")           # HTML (default)
result.save("/lakehouse/default/Files/reports/report.md", "md")       # Markdown
result.save("/lakehouse/default/Files/reports/report.txt", "txt")     # Plain text

# Via standalone function
from fabric_warehouse_advisor import save_report

save_report(result.html_report, "/path/to/report.html", format="html")
save_report(result.markdown_report, "/path/to/report.md", format="md")

The format parameter accepts "html", "md", or "txt". When omitted, it is inferred from the file extension.

For HTML format, if the content doesn't already contain <html> tags, the save function wraps it in a minimal HTML document with UTF-8 encoding and a title.

Parent directories are created automatically.

Report Sections

All three formats include the following sections:

Executive Summary

  • Total tables analysed
  • Tables with recommendations
  • Tables already clustered
  • Score threshold used

Per-Table Recommendations

For each table the report includes:

  • Table name, schema, and row count
  • Current CLUSTER BY columns (if any)
  • Warnings for sub-optimal existing clustering
  • Suggested CTAS DDL (when generate_ctas=True)

Each table also contains a column-level detail table with:

  • Column name and data type
  • Predicate hits (weighted by query runs)
  • Approximate distinct count, cardinality ratio, and percentage
  • Cardinality classification (High/Medium/Low)
  • Composite score (with visual bar in HTML)
  • Recommendation label
  • Optimization warnings (if applicable)

All Suggested DDL

A consolidated section with every CTAS statement from all tables, plus an explanatory note about how Fabric applies data clustering.

Best Practices

A reference section with key recommendations:

  • Data clustering is most effective on large tables
  • Choose mid-to-high cardinality columns used in WHERE filters
  • Batch ingestion (≥ 1M rows per DML) for optimal quality
  • Equality JOINs do NOT benefit from data clustering
  • Column order in CLUSTER BY doesn't affect row storage
  • char/varchar first 32 character limit for statistics
  • decimal precision > 18 predicate pushdown limitation

Customising Report Output

Reports are generated from Python dataclasses (TableRecommendation, ColumnScore) that you can also access directly:

for rec in result.recommendations:
    print(f"Table: {rec.schema_name}.{rec.table_name}")
    print(f"  Rows: {rec.row_count:,}")
    print(f"  Current CLUSTER BY: {rec.currently_clustered_columns}")
    print(f"  Warnings: {rec.warnings}")
    for col in rec.recommended_columns:
        print(f"  - {col.column_name}: score={col.composite_score}, {col.recommendation}")

This allows you to build custom reports, dashboards, or integrations.