Skip to content

docs: document ClickBench setup details#23315

Open
ByteBaker wants to merge 1 commit into
apache:mainfrom
ByteBaker:doc/clickbench-setup-update
Open

docs: document ClickBench setup details#23315
ByteBaker wants to merge 1 commit into
apache:mainfrom
ByteBaker:doc/clickbench-setup-update

Conversation

@ByteBaker

@ByteBaker ByteBaker commented Jul 3, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Docs consolidation. Explained in the issue.

What changes are included in this PR?

Only documentation.

Are these changes tested?

N/A. No code changes.

Are there any user-facing changes?

None. Only documentations.

LLM-generated code disclosure

This PR includes LLM-generated content. All of which was manually reviewed.

@ByteBaker ByteBaker force-pushed the doc/clickbench-setup-update branch from ffb74bf to 851ad22 Compare July 3, 2026 21:43
@ByteBaker

Copy link
Copy Markdown
Author

Hi @alamb for your review please.

@comphead

comphead commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@kosiew FYI

Comment thread benchmarks/README.md Outdated
The runner registers the parquet data as `hits_raw`, then creates a
`hits` view that casts `EventDate` through `INTEGER` to `DATE` for the
benchmark queries.
- The full and partitioned ClickBench datasets may store string columns

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to the partitioned ClickBench dataset -- what happens is that the string columns don't have a "string" logical type annotation in the parquet files

Maybe a better description is:

The source partitioned ClickBench datasets has string columns without
the "string" Parquet logical type annotation. These must be treated as
strings to correctly run the query, so the runner enables the parquet binary_as_string
option.

Comment thread benchmarks/README.md Outdated
```sql
CREATE EXTERNAL TABLE hits_raw
STORED AS PARQUET
LOCATION 'benchmarks/data/hits.parquet'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I m pretty sure this is only necessary for hits_partititoned

Comment thread benchmarks/README.md Outdated

```shell
./benchmarks/bench.sh data clickbench_1
cargo run --release --bin dfbench -- clickbench \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone would run a command like that -- instead if they want to run all the queries they would use bench.sh run

@alamb

alamb commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thank you for this @ByteBaker

@ByteBaker ByteBaker force-pushed the doc/clickbench-setup-update branch from 851ad22 to 988d500 Compare July 4, 2026 13:32
@ByteBaker

Copy link
Copy Markdown
Author

@alamb I updated the doc based on your comments above. Do review again.

@ByteBaker ByteBaker requested a review from alamb July 4, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate ClickBench Setup Documentation in benchmarks/README.md

3 participants