feat: support `CLUSTER BY [AUTO, NONE]` for Databricks by EhabEasee · Pull Request #5846 · SQLMesh/sqlmesh

EhabEasee · 2026-06-17T15:39:40Z

Description

Databricks supports two keyword forms of liquid clustering that don't take column arguments:

CLUSTER BY AUTO — lets Databricks automatically select clustering columns
CLUSTER BY NONE — disables liquid clustering on a table

Previously, SQLMesh had no way to express these in a model definition. This PR adds support for both.

constants.py: Adds LIQUID_CLUSTERING_KEYWORDS = frozenset({"AUTO", "NONE"}) as a shared constant used across the parser, validator, and adapter.

Parsing (dialect.py): The clustered_by property parser now recognises bare AUTO and NONE tokens (unquoted VAR tokens) as liquid clustering keywords rather than column references. Backtick-quoted `auto` / `none` are still treated as regular column names, preserving backwards compatibility for columns that happen to share those names.

Validation (meta.py): A single string passed to clustered_by is normalised to a list before processing. The validator then skips the column-count check for exp.Var(AUTO|NONE), but only when the field is clustered_by and the dialect is databricks. On deserialisation from JSON, keyword strings are restored to exp.Var sentinels before list_of_fields_validator can normalise them into quoted columns.

Validation (definition.py): The validate_definition column-existence check skips keyword sentinels for the same clustered_by + databricks scope.

Code generation (databricks.py): _build_table_properties_exp detects a single exp.Var in clustered_by (guarded by a ValueError if the Var holds an unexpected value), and emits CLUSTER BY AUTO / CLUSTER BY NONE without wrapping in a tuple. Multi-column paths are unchanged.

Usage:

-- In a SQLMesh model definition
MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by AUTO
);

MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by NONE
);

Via the Python API, both a plain string and exp.Var are accepted:

create_sql_model(..., dialect="databricks", clustered_by="AUTO")
create_sql_model(..., dialect="databricks", clustered_by=exp.Var(this="AUTO"))

Columns with the names auto or none are still supported via backtick quoting:

MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by (`auto`, `none`)
);

Test Plan

tests/core/test_dialect.py — parser round-trips: AUTO/NONE keywords, backtick-quoted columns, paren-wrapped single columns, multi-column lists, mixed list (a, AUTO), non-Databricks dialect
tests/core/test_model.py — model DDL; Python API with both exp.Var and plain string; backtick-quoted column names; render_definition output; JSON serialisation round-trip; non-Databricks dialect rejection; mixed-list column treatment
tests/core/engine_adapter/test_databricks.py — adapter emits CLUSTER BY AUTO / CLUSTER BY NONE without column parens

Checklist

I have run make style and fixed any issues
I have added tests for my changes (if applicable)
All existing tests pass (make fast-test)
My commits are signed off (git commit -s) per the DCO

…id clustering Adds parser, validator, and Databricks adapter support for the keyword forms of liquid clustering. Bare AUTO/NONE (unquoted VAR tokens) are recognised as keywords; backtick-quoted `auto`/`none` and parenthesised forms remain real column references. - Add LIQUID_CLUSTERING_KEYWORDS constant to avoid repeating the sentinel set across dialect, meta, definition, and adapter - Parser (dialect.py): detect VAR-token AUTO/NONE on clustered_by; strip Paren from single-column clustered_by to match partitioned_by normalisation - Validator (meta.py): normalise single string input to list; restore keyword sentinels from JSON strings on deserialisation; skip column-count check for keywords, gated on clustered_by + databricks - validate_definition (definition.py): skip keyword sentinels in the column-existence check, same gate - Adapter (databricks.py): emit CLUSTER BY AUTO / CLUSTER BY NONE without a tuple wrapper; raise ValueError on unexpected bare Var - Tests: parser round-trips, Python API (exp.Var and plain string), backtick-quoted columns, render_definition, JSON round-trip, non-Databricks rejection, mixed-list behaviour, adapter SQL emission Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

StuffbyYuki · 2026-06-29T05:27:44Z

@EhabEasee Thanks for this PR!

Not trying to be nit-picky, but here's a few items:

Docs: Add a note in model docs that Databricks supports clustered_by AUTO / NONE, and that backticks are needed for real columns named auto/none.
Test: test_clustered_by_keyword_non_databricks_dialect: perhaps use pytest.raises(ConfigError) instead of (ConfigError, Exception).

Let me know if I'm missing anything!

…ed_by docs

…d_non_databricks_dialect

EhabEasee · 2026-06-29T08:53:15Z

@StuffbyYuki both comments make sense and I've made the updates. However, the comment in the docs feels misplaced and easy to miss.

I was considering adding it in the Databricks engine docs but couldn't find a reasonable place to add it. Do you have any suggestions on a more relevant place to add that note? The StarRocks docs seem to have something similar so I could imitate that?

StuffbyYuki · 2026-06-29T15:04:10Z

@EhabEasee thanks! Yeah I don't think it has to be that big block like starrocks docs do, but I just figured adding something somewhere in the docs might be helpful! I'll let you decide where and how to put it on the docs

… clustered_by docs" This reverts commit bb70305.

…tion docs

EhabEasee · 2026-06-30T07:13:02Z

@StuffbyYuki I added a new section to the databricks integration docs. Let me know if you have any more feedback

EhabEasee force-pushed the feat/clustered-by-auto-none branch from 6f3e9a9 to 4f29141 Compare June 25, 2026 09:39

EhabEasee changed the title ~~feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering~~ feat: support CLUSTER BY [AUTO, NONE] for Databricks Jun 25, 2026

StuffbyYuki self-requested a review June 29, 2026 05:27

EhabEasee and others added 3 commits June 29, 2026 08:45

Merge branch 'main' into feat/clustered-by-auto-none

6f2b797

docs: note Databricks liquid clustering AUTO/NONE keywords in cluster…

bb70305

…ed_by docs

test: narrow pytest.raises to ConfigError in test_clustered_by_keywor…

c735b63

…d_non_databricks_dialect

EhabEasee added 3 commits June 30, 2026 09:08

Revert "docs: note Databricks liquid clustering AUTO/NONE keywords in…

679f6e5

… clustered_by docs" This reverts commit bb70305.

docs: note Databricks liquid clustering AUTO/NONE keywords in integra…

ecca850

…tion docs

docs: add seperator lines

2586168

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support `CLUSTER BY [AUTO, NONE]` for Databricks#5846

feat: support `CLUSTER BY [AUTO, NONE]` for Databricks#5846
EhabEasee wants to merge 7 commits into
SQLMesh:mainfrom
EhabEasee:feat/clustered-by-auto-none

EhabEasee commented Jun 17, 2026 •

edited

Loading

Uh oh!

StuffbyYuki commented Jun 29, 2026

Uh oh!

EhabEasee commented Jun 29, 2026 •

edited

Loading

Uh oh!

StuffbyYuki commented Jun 29, 2026

Uh oh!

EhabEasee commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

EhabEasee commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Plan

Checklist

Uh oh!

StuffbyYuki commented Jun 29, 2026

Uh oh!

EhabEasee commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StuffbyYuki commented Jun 29, 2026

Uh oh!

EhabEasee commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EhabEasee commented Jun 17, 2026 •

edited

Loading

EhabEasee commented Jun 29, 2026 •

edited

Loading