refactor(hash-aggr): Simplify aggregate hash table by 2010YOUY01 · Pull Request #23309 · apache/datafusion

2010YOUY01 · 2026-07-03T14:18:47Z

Which issue does this PR close?

Closes #.

Rationale for this change

Addressing @alamb ’s comment from #23181 (review)

The motivation behind the EPIC was that the existing aggregation code had too many internal flags and was reused by many different execution paths, making it hard to maintain. The previous refactor made two main changes:

Split different aggregation modes, such as partial and final, into separate streams.
Split the hash tables used by different aggregation modes into separate implementations.

Now that the refactor is almost done, with only single aggregation mode and spilling behavior left, I think decision 1 was still solid, while decision 2 was an overcorrection.

A module usually becomes hard to maintain when it has many tightly coupled internal flags. In the old GroupsHashAggregateStream case, there were ~10 such flags. But a small number of flags, I think <= 3, is still clean and manageable.

This PR unifies the aggregate hash tables used by different modes into a single implementation. The behavioral differences are controlled by one internal mode flag. See the comments in common.rs: struct AggregateHashTable, for details.

Since this refactor is close to completion, I can confirm the internal complexity of AggregateHashTable is not likely to bloat in the long term, so unifying these implementations seems like the better design.

What changes are included in this PR?

Replace separated aggregate hash table implementation with a unified one

Are these changes tested?

Existing tests

Are there any user-facing changes?

alamb

Thank you @2010YOUY01 -- I think this looks like a nice cleanup / consolidation to me

alamb · 2026-07-03T21:06:24Z

+}
+
+/// Marker for ordered raw rows -> partial state aggregation.
 pub(in crate::aggregates) struct PartialMarker;


do you plan to remove this PartialMarker for the ordered streams too? Maybe we could do the same simplification

Yes, this can be done similarly.

But I think there is a better fast path to implement for fully ordered case (query group by a,b, input order order by a,b), this idea can be implemented in a completely separated path without depending on the aggregate hash tables; while the partially ordered case still relies on hash table, and only do early emit optimization.

So I think probably its easier to do after we have further optimized the full order fast path.

Filed #23318 to track.

alamb · 2026-07-03T21:08:43Z

+        // table semantics. Consider remove `AggregateMode` and only use `AggregateTableMode`
+        // after the refactor has finished.
+        //
+        // Issue: <https://github.com/apache/datafusion/pull/22729>


This seems like a PR (not an issue) -- is it the link you intended?

Note I agree with the fact that AggregateMode seems overly complicated. I vaguely remember trying to remove it once but I can't find the PR now (maybe I never pushed it)

Yes, that's a wrong link. Fixed

alamb · 2026-07-03T21:10:22Z

+                .iter_mut()
+                .zip(evaluated_batch.accumulator_args.iter())
+            {
+                match mode {


it probably doesn't matter as this is one match per column, but I wonder if we should do the match outside and the loop inside 🤔

I agree, this way seems a little bit easier to read: 6871dc7

alamb · 2026-07-03T21:11:27Z

+
+/// Hash table used only for converting raw input rows directly into partial
+/// aggregate state rows after partial aggregation has been skipped.
+pub(in crate::aggregates) struct PartialSkipHashTable {


i like that this has a named newtype wrapper

alamb · 2026-07-03T21:12:09Z

run benchmarks

adriangbot · 2026-07-03T21:14:54Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4879357464-843-5qvcm 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-simplify-ht (7f68c23) to e4aa41d (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-07-03T21:14:54Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4879357464-842-kqn89 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-simplify-ht (7f68c23) to e4aa41d (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-07-03T21:15:02Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4879357464-841-jrdm8 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-simplify-ht (7f68c23) to e4aa41d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-07-03T21:28:45Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and split-aggr-simplify-ht
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃         split-aggr-simplify-ht ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.84 / 41.07 ±1.41 / 42.40 ms │ 38.64 / 39.49 ±1.06 / 41.53 ms │     no change │
│ QQuery 2  │ 19.52 / 20.54 ±0.75 / 21.51 ms │ 19.25 / 19.69 ±0.32 / 20.08 ms │     no change │
│ QQuery 3  │ 31.49 / 33.63 ±1.32 / 34.79 ms │ 31.58 / 34.32 ±1.53 / 36.22 ms │     no change │
│ QQuery 4  │ 17.63 / 18.12 ±0.28 / 18.47 ms │ 17.97 / 18.36 ±0.51 / 19.36 ms │     no change │
│ QQuery 5  │ 40.19 / 42.56 ±1.41 / 44.13 ms │ 38.63 / 41.15 ±1.31 / 42.33 ms │     no change │
│ QQuery 6  │ 16.53 / 16.79 ±0.20 / 17.11 ms │ 16.45 / 16.54 ±0.06 / 16.62 ms │     no change │
│ QQuery 7  │ 45.55 / 48.36 ±2.17 / 51.38 ms │ 43.79 / 45.84 ±1.48 / 48.29 ms │ +1.05x faster │
│ QQuery 8  │ 44.16 / 44.31 ±0.16 / 44.53 ms │ 43.60 / 43.76 ±0.13 / 43.97 ms │     no change │
│ QQuery 9  │ 50.66 / 51.30 ±0.58 / 52.24 ms │ 50.66 / 51.42 ±0.52 / 51.94 ms │     no change │
│ QQuery 10 │ 43.09 / 43.82 ±1.04 / 45.84 ms │ 43.12 / 43.44 ±0.31 / 43.85 ms │     no change │
│ QQuery 11 │ 14.04 / 14.70 ±0.80 / 16.24 ms │ 13.91 / 13.97 ±0.06 / 14.05 ms │     no change │
│ QQuery 12 │ 24.41 / 26.10 ±2.41 / 30.84 ms │ 24.58 / 24.76 ±0.19 / 25.10 ms │ +1.05x faster │
│ QQuery 13 │ 34.31 / 36.23 ±1.66 / 39.33 ms │ 32.50 / 34.47 ±1.16 / 35.80 ms │     no change │
│ QQuery 14 │ 24.58 / 25.34 ±1.11 / 27.51 ms │ 24.59 / 25.15 ±0.54 / 26.09 ms │     no change │
│ QQuery 15 │ 31.86 / 32.68 ±0.77 / 33.99 ms │ 32.03 / 32.70 ±0.50 / 33.56 ms │     no change │
│ QQuery 16 │ 14.04 / 14.40 ±0.26 / 14.77 ms │ 14.61 / 14.76 ±0.12 / 14.96 ms │     no change │
│ QQuery 17 │ 74.24 / 75.68 ±1.10 / 76.95 ms │ 76.21 / 77.13 ±1.30 / 79.71 ms │     no change │
│ QQuery 18 │ 59.40 / 60.83 ±1.27 / 62.74 ms │ 60.90 / 62.32 ±1.46 / 64.28 ms │     no change │
│ QQuery 19 │ 33.73 / 33.97 ±0.21 / 34.25 ms │ 33.90 / 34.10 ±0.21 / 34.49 ms │     no change │
│ QQuery 20 │ 33.20 / 33.31 ±0.12 / 33.50 ms │ 33.00 / 33.25 ±0.42 / 34.08 ms │     no change │
│ QQuery 21 │ 57.04 / 58.10 ±1.01 / 59.81 ms │ 57.58 / 58.61 ±0.93 / 60.15 ms │     no change │
│ QQuery 22 │ 14.42 / 14.55 ±0.09 / 14.63 ms │ 14.08 / 14.75 ±0.48 / 15.49 ms │     no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                     ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 786.41ms │
│ Total Time (split-aggr-simplify-ht)   │ 779.96ms │
│ Average Time (HEAD)                   │  35.75ms │
│ Average Time (split-aggr-simplify-ht) │  35.45ms │
│ Queries Faster                        │        2 │
│ Queries Slower                        │        0 │
│ Queries with No Change                │       20 │
│ Queries with Failure                  │        0 │
└───────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	1.1 GiB
Avg memory	507.1 MiB
CPU user	22.8s
CPU sys	1.8s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	1.1 GiB
Avg memory	501.2 MiB
CPU user	23.0s
CPU sys	1.6s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-07-03T21:30:27Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and split-aggr-simplify-ht
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃                split-aggr-simplify-ht ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           5.52 / 6.00 ±0.86 / 7.72 ms │           5.58 / 6.08 ±0.86 / 7.79 ms │     no change │
│ QQuery 2  │        82.18 / 82.38 ±0.15 / 82.59 ms │        80.74 / 81.00 ±0.21 / 81.25 ms │     no change │
│ QQuery 3  │        29.51 / 29.76 ±0.24 / 30.22 ms │        29.55 / 29.83 ±0.18 / 30.06 ms │     no change │
│ QQuery 4  │     485.95 / 492.28 ±3.32 / 495.00 ms │    545.86 / 566.81 ±10.66 / 575.63 ms │  1.15x slower │
│ QQuery 5  │        51.91 / 52.16 ±0.31 / 52.75 ms │        51.60 / 52.96 ±1.44 / 55.56 ms │     no change │
│ QQuery 6  │        36.54 / 37.18 ±0.37 / 37.57 ms │        36.19 / 36.67 ±0.25 / 36.92 ms │     no change │
│ QQuery 7  │        94.56 / 95.36 ±0.67 / 96.37 ms │       93.47 / 96.31 ±2.59 / 100.82 ms │     no change │
│ QQuery 8  │        37.01 / 38.27 ±1.76 / 41.73 ms │        36.56 / 36.99 ±0.52 / 38.01 ms │     no change │
│ QQuery 9  │        53.01 / 55.13 ±1.76 / 58.34 ms │        52.87 / 53.92 ±0.82 / 55.17 ms │     no change │
│ QQuery 10 │        62.91 / 63.38 ±0.29 / 63.72 ms │        63.82 / 64.55 ±0.67 / 65.77 ms │     no change │
│ QQuery 11 │     305.88 / 315.55 ±6.94 / 326.16 ms │     297.00 / 301.90 ±2.66 / 305.12 ms │     no change │
│ QQuery 12 │        29.31 / 29.61 ±0.19 / 29.78 ms │        28.55 / 28.81 ±0.14 / 28.96 ms │     no change │
│ QQuery 13 │     119.63 / 120.33 ±0.89 / 122.06 ms │     118.62 / 120.18 ±1.21 / 122.34 ms │     no change │
│ QQuery 14 │     413.99 / 419.88 ±3.84 / 425.70 ms │     413.96 / 418.42 ±3.74 / 425.29 ms │     no change │
│ QQuery 15 │        57.85 / 58.62 ±0.64 / 59.56 ms │        57.10 / 58.67 ±1.10 / 60.16 ms │     no change │
│ QQuery 16 │           6.84 / 6.96 ±0.16 / 7.28 ms │           6.75 / 6.93 ±0.21 / 7.35 ms │     no change │
│ QQuery 17 │        80.93 / 82.30 ±1.95 / 86.15 ms │        80.99 / 82.38 ±1.28 / 84.08 ms │     no change │
│ QQuery 18 │     124.44 / 125.71 ±1.02 / 127.04 ms │     123.43 / 125.26 ±1.42 / 127.13 ms │     no change │
│ QQuery 19 │        41.54 / 41.66 ±0.10 / 41.82 ms │        41.67 / 42.59 ±0.94 / 44.40 ms │     no change │
│ QQuery 20 │        35.80 / 36.97 ±1.83 / 40.62 ms │        35.84 / 37.01 ±0.99 / 38.43 ms │     no change │
│ QQuery 21 │        17.59 / 18.04 ±0.31 / 18.48 ms │        18.02 / 18.18 ±0.15 / 18.44 ms │     no change │
│ QQuery 22 │        62.39 / 62.90 ±0.42 / 63.66 ms │        62.30 / 64.02 ±1.21 / 65.42 ms │     no change │
│ QQuery 23 │     348.97 / 352.54 ±2.80 / 356.84 ms │     341.57 / 347.49 ±3.94 / 351.81 ms │     no change │
│ QQuery 24 │     226.10 / 231.53 ±6.39 / 241.59 ms │     225.63 / 227.34 ±2.82 / 232.94 ms │     no change │
│ QQuery 25 │     118.74 / 119.43 ±0.45 / 120.15 ms │     110.17 / 111.75 ±1.12 / 113.56 ms │ +1.07x faster │
│ QQuery 26 │        62.07 / 63.03 ±0.60 / 63.94 ms │        58.68 / 59.59 ±0.70 / 60.83 ms │ +1.06x faster │
│ QQuery 27 │           7.11 / 7.21 ±0.17 / 7.55 ms │           6.38 / 6.51 ±0.17 / 6.84 ms │ +1.11x faster │
│ QQuery 28 │        63.64 / 64.12 ±0.57 / 65.18 ms │        61.00 / 61.70 ±0.68 / 62.90 ms │     no change │
│ QQuery 29 │     102.26 / 107.95 ±6.86 / 120.47 ms │        97.00 / 97.57 ±0.68 / 98.89 ms │ +1.11x faster │
│ QQuery 30 │        36.18 / 36.86 ±0.55 / 37.84 ms │        33.04 / 34.74 ±1.95 / 38.56 ms │ +1.06x faster │
│ QQuery 31 │     117.80 / 120.84 ±1.74 / 122.86 ms │     111.98 / 112.23 ±0.20 / 112.54 ms │ +1.08x faster │
│ QQuery 32 │        21.78 / 22.68 ±0.68 / 23.83 ms │        20.53 / 21.00 ±0.53 / 21.85 ms │ +1.08x faster │
│ QQuery 33 │        38.92 / 39.55 ±0.65 / 40.46 ms │        37.87 / 38.33 ±0.33 / 38.64 ms │     no change │
│ QQuery 34 │        10.56 / 10.97 ±0.43 / 11.75 ms │         9.93 / 11.72 ±2.70 / 17.04 ms │  1.07x slower │
│ QQuery 35 │        77.97 / 78.29 ±0.19 / 78.51 ms │        72.44 / 73.35 ±0.70 / 74.48 ms │ +1.07x faster │
│ QQuery 36 │           6.29 / 6.43 ±0.15 / 6.72 ms │           6.03 / 6.15 ±0.16 / 6.47 ms │     no change │
│ QQuery 37 │           7.53 / 7.58 ±0.05 / 7.68 ms │           7.04 / 7.17 ±0.11 / 7.35 ms │ +1.06x faster │
│ QQuery 38 │        62.93 / 65.41 ±2.01 / 67.84 ms │        61.85 / 62.52 ±0.57 / 63.38 ms │     no change │
│ QQuery 39 │        90.78 / 91.98 ±0.84 / 93.17 ms │        89.98 / 91.18 ±0.91 / 92.76 ms │     no change │
│ QQuery 40 │        23.69 / 24.10 ±0.30 / 24.39 ms │        23.73 / 24.24 ±0.37 / 24.83 ms │     no change │
│ QQuery 41 │        11.55 / 11.66 ±0.12 / 11.89 ms │        11.57 / 11.79 ±0.24 / 12.25 ms │     no change │
│ QQuery 42 │        23.75 / 24.13 ±0.24 / 24.41 ms │        23.77 / 24.20 ±0.35 / 24.77 ms │     no change │
│ QQuery 43 │           4.91 / 5.02 ±0.16 / 5.34 ms │           5.12 / 5.22 ±0.15 / 5.52 ms │     no change │
│ QQuery 44 │           9.34 / 9.42 ±0.07 / 9.51 ms │           9.50 / 9.67 ±0.11 / 9.85 ms │     no change │
│ QQuery 45 │        39.77 / 42.79 ±3.12 / 48.83 ms │        37.80 / 38.44 ±0.88 / 40.17 ms │ +1.11x faster │
│ QQuery 46 │        11.89 / 12.49 ±0.50 / 13.17 ms │        11.88 / 12.13 ±0.17 / 12.38 ms │     no change │
│ QQuery 47 │     231.00 / 233.21 ±2.95 / 238.89 ms │    228.03 / 261.78 ±26.13 / 293.45 ms │  1.12x slower │
│ QQuery 48 │        96.83 / 97.63 ±0.57 / 98.58 ms │      98.69 / 101.32 ±2.00 / 104.57 ms │     no change │
│ QQuery 49 │        75.98 / 77.31 ±0.93 / 78.91 ms │        79.49 / 81.60 ±2.52 / 86.50 ms │  1.06x slower │
│ QQuery 50 │        58.86 / 59.35 ±0.35 / 59.88 ms │        61.97 / 62.93 ±0.61 / 63.81 ms │  1.06x slower │
│ QQuery 51 │        91.42 / 94.01 ±2.92 / 99.21 ms │       97.24 / 99.03 ±1.40 / 100.82 ms │  1.05x slower │
│ QQuery 52 │        23.87 / 24.75 ±0.85 / 25.92 ms │        25.42 / 26.15 ±0.89 / 27.86 ms │  1.06x slower │
│ QQuery 53 │        29.65 / 29.96 ±0.29 / 30.49 ms │        30.38 / 30.84 ±0.41 / 31.52 ms │     no change │
│ QQuery 54 │        55.55 / 55.88 ±0.33 / 56.46 ms │        57.37 / 57.84 ±0.38 / 58.26 ms │     no change │
│ QQuery 55 │        23.24 / 23.56 ±0.38 / 24.29 ms │        24.18 / 24.44 ±0.24 / 24.82 ms │     no change │
│ QQuery 56 │        38.66 / 39.02 ±0.21 / 39.21 ms │        40.47 / 40.58 ±0.10 / 40.73 ms │     no change │
│ QQuery 57 │     177.64 / 179.94 ±2.34 / 184.17 ms │     176.51 / 183.06 ±6.13 / 191.30 ms │     no change │
│ QQuery 58 │     118.49 / 119.65 ±1.09 / 121.24 ms │     114.88 / 118.06 ±2.48 / 122.34 ms │     no change │
│ QQuery 59 │     118.48 / 118.90 ±0.37 / 119.38 ms │     117.34 / 117.83 ±0.37 / 118.32 ms │     no change │
│ QQuery 60 │        39.87 / 41.31 ±1.41 / 43.66 ms │        39.59 / 40.27 ±0.69 / 41.58 ms │     no change │
│ QQuery 61 │        12.35 / 12.52 ±0.21 / 12.93 ms │        12.60 / 12.73 ±0.15 / 13.02 ms │     no change │
│ QQuery 62 │        46.45 / 47.03 ±0.51 / 47.79 ms │        46.22 / 46.43 ±0.22 / 46.82 ms │     no change │
│ QQuery 63 │        29.79 / 30.03 ±0.27 / 30.50 ms │        29.90 / 30.14 ±0.13 / 30.27 ms │     no change │
│ QQuery 64 │    410.82 / 423.58 ±12.27 / 443.75 ms │     408.39 / 415.09 ±6.76 / 425.01 ms │     no change │
│ QQuery 65 │     154.74 / 159.81 ±4.59 / 167.76 ms │     141.85 / 144.49 ±2.32 / 148.57 ms │ +1.11x faster │
│ QQuery 66 │        85.40 / 86.47 ±0.67 / 87.32 ms │        79.06 / 79.50 ±0.37 / 79.93 ms │ +1.09x faster │
│ QQuery 67 │    265.23 / 273.79 ±10.47 / 294.15 ms │     237.81 / 242.42 ±4.56 / 250.95 ms │ +1.13x faster │
│ QQuery 68 │        11.97 / 12.17 ±0.25 / 12.66 ms │        11.91 / 12.08 ±0.19 / 12.46 ms │     no change │
│ QQuery 69 │        57.43 / 57.96 ±0.49 / 58.83 ms │        58.78 / 60.17 ±2.14 / 64.42 ms │     no change │
│ QQuery 70 │     105.29 / 110.36 ±7.11 / 124.16 ms │     104.78 / 107.04 ±2.30 / 111.39 ms │     no change │
│ QQuery 71 │        35.49 / 36.11 ±0.52 / 37.08 ms │        35.20 / 35.82 ±0.54 / 36.64 ms │     no change │
│ QQuery 72 │ 2098.62 / 2148.50 ±50.13 / 2236.60 ms │ 2144.19 / 2247.86 ±82.17 / 2394.52 ms │     no change │
│ QQuery 73 │          9.65 / 9.92 ±0.22 / 10.24 ms │         9.74 / 10.01 ±0.29 / 10.45 ms │     no change │
│ QQuery 74 │     170.45 / 174.40 ±3.77 / 179.43 ms │     166.95 / 171.09 ±4.80 / 180.15 ms │     no change │
│ QQuery 75 │     149.41 / 151.02 ±1.61 / 154.05 ms │     148.47 / 149.74 ±1.02 / 150.88 ms │     no change │
│ QQuery 76 │        35.17 / 35.65 ±0.31 / 36.10 ms │        35.03 / 35.54 ±0.57 / 36.39 ms │     no change │
│ QQuery 77 │        61.42 / 65.04 ±3.90 / 71.61 ms │        61.13 / 61.49 ±0.34 / 62.11 ms │ +1.06x faster │
│ QQuery 78 │     218.67 / 224.47 ±5.22 / 233.96 ms │     194.52 / 198.27 ±4.18 / 205.00 ms │ +1.13x faster │
│ QQuery 79 │        72.20 / 72.85 ±0.35 / 73.21 ms │        67.06 / 68.06 ±1.46 / 70.96 ms │ +1.07x faster │
│ QQuery 80 │     106.93 / 113.35 ±4.85 / 121.48 ms │      98.68 / 100.35 ±1.25 / 102.43 ms │ +1.13x faster │
│ QQuery 81 │        28.75 / 29.08 ±0.46 / 29.95 ms │        25.94 / 26.11 ±0.18 / 26.45 ms │ +1.11x faster │
│ QQuery 82 │        18.10 / 18.29 ±0.15 / 18.49 ms │        16.62 / 16.67 ±0.05 / 16.76 ms │ +1.10x faster │
│ QQuery 83 │        42.77 / 43.29 ±0.71 / 44.65 ms │        40.40 / 41.06 ±0.90 / 42.85 ms │ +1.05x faster │
│ QQuery 84 │        31.79 / 32.01 ±0.18 / 32.33 ms │        30.47 / 30.74 ±0.19 / 30.97 ms │     no change │
│ QQuery 85 │     112.76 / 115.84 ±3.78 / 122.97 ms │     107.16 / 110.48 ±3.46 / 117.11 ms │     no change │
│ QQuery 86 │        26.61 / 26.84 ±0.27 / 27.36 ms │        24.84 / 25.07 ±0.12 / 25.18 ms │ +1.07x faster │
│ QQuery 87 │        68.02 / 68.95 ±1.02 / 70.89 ms │        62.92 / 63.41 ±0.36 / 63.79 ms │ +1.09x faster │
│ QQuery 88 │        65.42 / 66.28 ±0.73 / 67.43 ms │        62.99 / 64.18 ±0.70 / 65.19 ms │     no change │
│ QQuery 89 │        37.84 / 38.36 ±0.62 / 39.46 ms │        36.33 / 36.87 ±0.52 / 37.82 ms │     no change │
│ QQuery 90 │        18.30 / 18.46 ±0.13 / 18.67 ms │        17.35 / 17.59 ±0.18 / 17.85 ms │     no change │
│ QQuery 91 │        46.87 / 48.23 ±0.75 / 49.17 ms │        46.82 / 47.19 ±0.21 / 47.45 ms │     no change │
│ QQuery 92 │        29.78 / 31.76 ±3.02 / 37.77 ms │        29.76 / 30.39 ±0.52 / 31.14 ms │     no change │
│ QQuery 93 │        50.74 / 51.67 ±0.85 / 53.16 ms │        50.55 / 51.03 ±0.32 / 51.45 ms │     no change │
│ QQuery 94 │        38.98 / 39.29 ±0.24 / 39.69 ms │        41.21 / 43.52 ±2.76 / 48.81 ms │  1.11x slower │
│ QQuery 95 │        81.49 / 82.27 ±0.41 / 82.62 ms │        86.47 / 87.49 ±0.57 / 88.01 ms │  1.06x slower │
│ QQuery 96 │        24.43 / 26.14 ±2.70 / 31.51 ms │        25.46 / 25.57 ±0.07 / 25.65 ms │     no change │
│ QQuery 97 │        47.12 / 47.81 ±0.63 / 48.94 ms │        49.07 / 49.89 ±0.87 / 51.55 ms │     no change │
│ QQuery 98 │        42.35 / 43.27 ±1.04 / 45.20 ms │        46.85 / 49.09 ±3.30 / 55.64 ms │  1.13x slower │
│ QQuery 99 │        70.76 / 71.14 ±0.37 / 71.77 ms │        73.10 / 73.81 ±0.46 / 74.27 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 10164.89ms │
│ Total Time (split-aggr-simplify-ht)   │ 10189.70ms │
│ Average Time (HEAD)                   │   102.68ms │
│ Average Time (split-aggr-simplify-ht) │   102.93ms │
│ Queries Faster                        │         22 │
│ Queries Slower                        │         10 │
│ Queries with No Change                │         67 │
│ Queries with Failure                  │          0 │
└───────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	2.0 GiB
Avg memory	1.4 GiB
CPU user	228.8s
CPU sys	5.8s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	2.2 GiB
Avg memory	1.5 GiB
CPU user	233.8s
CPU sys	5.7s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-07-03T21:33:30Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and split-aggr-simplify-ht
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃                split-aggr-simplify-ht ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.05 ±5.49 / 15.03 ms │          1.24 / 3.97 ±5.40 / 14.77 ms │     no change │
│ QQuery 1  │        12.66 / 13.02 ±0.19 / 13.16 ms │        12.34 / 12.63 ±0.15 / 12.78 ms │     no change │
│ QQuery 2  │        35.82 / 36.11 ±0.24 / 36.38 ms │        35.83 / 36.19 ±0.28 / 36.56 ms │     no change │
│ QQuery 3  │        30.77 / 31.62 ±1.12 / 33.83 ms │        30.51 / 30.92 ±0.46 / 31.72 ms │     no change │
│ QQuery 4  │     223.84 / 227.07 ±2.41 / 230.03 ms │     223.49 / 226.15 ±2.47 / 230.33 ms │     no change │
│ QQuery 5  │     273.31 / 273.96 ±0.54 / 274.91 ms │     271.76 / 272.72 ±1.12 / 274.60 ms │     no change │
│ QQuery 6  │           1.31 / 1.47 ±0.23 / 1.91 ms │           1.28 / 1.43 ±0.22 / 1.86 ms │     no change │
│ QQuery 7  │        13.86 / 14.04 ±0.12 / 14.23 ms │        13.76 / 13.98 ±0.13 / 14.15 ms │     no change │
│ QQuery 8  │     323.12 / 326.73 ±2.02 / 329.19 ms │     316.74 / 322.16 ±2.75 / 324.43 ms │     no change │
│ QQuery 9  │    438.72 / 465.95 ±14.14 / 478.29 ms │     453.60 / 461.13 ±6.62 / 471.50 ms │     no change │
│ QQuery 10 │        69.97 / 70.81 ±0.83 / 71.85 ms │        71.15 / 72.41 ±1.30 / 74.76 ms │     no change │
│ QQuery 11 │        80.88 / 82.73 ±2.48 / 87.64 ms │        82.22 / 83.65 ±1.26 / 85.93 ms │     no change │
│ QQuery 12 │     267.15 / 271.64 ±4.61 / 277.44 ms │     266.16 / 274.36 ±8.00 / 284.98 ms │     no change │
│ QQuery 13 │     367.13 / 377.31 ±9.65 / 394.53 ms │    364.32 / 380.35 ±14.23 / 402.66 ms │     no change │
│ QQuery 14 │     281.15 / 283.77 ±2.21 / 287.64 ms │     278.90 / 283.77 ±5.89 / 294.76 ms │     no change │
│ QQuery 15 │     274.29 / 281.90 ±7.93 / 295.76 ms │     268.77 / 274.53 ±3.27 / 277.48 ms │     no change │
│ QQuery 16 │     612.00 / 619.45 ±6.46 / 629.63 ms │     614.30 / 618.34 ±3.44 / 624.36 ms │     no change │
│ QQuery 17 │     628.63 / 634.48 ±4.92 / 643.40 ms │     614.06 / 626.81 ±9.07 / 641.75 ms │     no change │
│ QQuery 18 │ 1238.98 / 1265.03 ±19.70 / 1294.62 ms │ 1240.74 / 1260.59 ±15.76 / 1283.65 ms │     no change │
│ QQuery 19 │        27.70 / 29.25 ±2.75 / 34.74 ms │        27.46 / 30.29 ±4.64 / 39.46 ms │     no change │
│ QQuery 20 │     517.07 / 528.49 ±6.79 / 535.75 ms │     515.30 / 519.37 ±2.74 / 522.02 ms │     no change │
│ QQuery 21 │     512.11 / 518.18 ±5.81 / 529.24 ms │     513.94 / 519.96 ±5.72 / 529.95 ms │     no change │
│ QQuery 22 │    985.77 / 996.83 ±8.63 / 1008.67 ms │    979.96 / 988.81 ±9.32 / 1005.15 ms │     no change │
│ QQuery 23 │ 3044.81 / 3070.47 ±20.65 / 3099.53 ms │ 3083.92 / 3107.22 ±24.64 / 3151.60 ms │     no change │
│ QQuery 24 │        41.03 / 47.91 ±8.30 / 60.86 ms │       42.10 / 50.69 ±10.43 / 66.91 ms │  1.06x slower │
│ QQuery 25 │     111.43 / 112.34 ±1.01 / 114.24 ms │     111.74 / 113.16 ±1.57 / 116.17 ms │     no change │
│ QQuery 26 │        41.73 / 42.32 ±0.41 / 43.00 ms │        41.47 / 41.95 ±0.57 / 43.00 ms │     no change │
│ QQuery 27 │     665.83 / 673.12 ±5.60 / 682.99 ms │     673.06 / 676.68 ±2.71 / 679.75 ms │     no change │
│ QQuery 28 │  3037.33 / 3045.50 ±6.08 / 3055.04 ms │ 3021.32 / 3051.62 ±17.60 / 3076.16 ms │     no change │
│ QQuery 29 │        40.47 / 44.60 ±7.69 / 59.97 ms │        40.65 / 40.84 ±0.13 / 41.03 ms │ +1.09x faster │
│ QQuery 30 │    305.50 / 315.80 ±10.20 / 334.07 ms │    302.12 / 314.55 ±16.25 / 346.64 ms │     no change │
│ QQuery 31 │    287.86 / 296.56 ±10.55 / 312.63 ms │     281.08 / 295.23 ±8.23 / 306.84 ms │     no change │
│ QQuery 32 │    918.23 / 949.74 ±20.64 / 969.99 ms │    911.77 / 926.81 ±15.75 / 954.52 ms │     no change │
│ QQuery 33 │ 1439.46 / 1464.72 ±23.17 / 1507.76 ms │ 1428.11 / 1461.49 ±24.67 / 1491.91 ms │     no change │
│ QQuery 34 │  1475.01 / 1481.97 ±4.32 / 1486.36 ms │ 1478.44 / 1513.12 ±29.95 / 1558.43 ms │     no change │
│ QQuery 35 │    278.56 / 295.84 ±23.63 / 342.62 ms │    276.20 / 289.44 ±14.85 / 316.56 ms │     no change │
│ QQuery 36 │        65.78 / 71.55 ±4.17 / 78.28 ms │        66.51 / 70.28 ±4.34 / 77.41 ms │     no change │
│ QQuery 37 │        35.91 / 40.60 ±4.08 / 46.64 ms │        35.27 / 38.76 ±3.21 / 44.53 ms │     no change │
│ QQuery 38 │        40.98 / 43.78 ±2.75 / 48.74 ms │        41.27 / 43.90 ±2.78 / 48.66 ms │     no change │
│ QQuery 39 │     136.07 / 147.42 ±7.73 / 157.86 ms │     142.77 / 153.15 ±8.31 / 165.88 ms │     no change │
│ QQuery 40 │        14.09 / 16.36 ±3.13 / 22.45 ms │        14.02 / 16.88 ±5.24 / 27.36 ms │     no change │
│ QQuery 41 │        13.65 / 13.93 ±0.22 / 14.19 ms │        13.58 / 14.14 ±0.55 / 14.98 ms │     no change │
│ QQuery 42 │        13.44 / 15.57 ±4.08 / 23.72 ms │        13.25 / 15.26 ±3.76 / 22.78 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 19543.95ms │
│ Total Time (split-aggr-simplify-ht)   │ 19549.70ms │
│ Average Time (HEAD)                   │   454.51ms │
│ Average Time (split-aggr-simplify-ht) │   454.64ms │
│ Queries Faster                        │          1 │
│ Queries Slower                        │          1 │
│ Queries with No Change                │         41 │
│ Queries with Failure                  │          0 │
└───────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	12.2 GiB
Avg memory	4.6 GiB
CPU user	1006.4s
CPU sys	65.8s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	12.0 GiB
Avg memory	4.4 GiB
CPU user	1007.0s
CPU sys	66.1s
Peak spill	0 B

File an issue against this benchmark runner

Rachelint · 2026-07-04T02:02:35Z

+                .zip(evaluated_batch.accumulator_args.iter())
+            {
+                match mode {
+                    AggregateTableMode::Partial | AggregateTableMode::Single => {


I am concerned codes here seem switching to the way like row_hash.rs: all logic in one place, and using if else to decide where we go.

I wonder if it would help to factor these decisions into other abstractions? Like maybe a method on AggregateTableMode 🤔

Rachelint · 2026-07-04T02:08:49Z

+        // Filters apply only when the table consumes raw input rows. Final and
+        // partial-reduce modes consume partial states, so their filters are not
+        // applicable.
+        let filters = match mode {


Same concern like what in aggregate_batch

Rachelint · 2026-07-04T02:18:06Z

+/// - `Final`: merge_batch + evaluate
+/// - `PartialReduce`: merge_batch + state
+/// - `Single`: update_batch + evaluate
+pub(in crate::aggregates) struct AggregateHashTable {


I think maybe we should keep different AggregateHashTables for different usage, and remove duplicated codes based on this?
Due to :

Concerns in aggregate_batch and other methods.

Some method like partial_skip_table, it is logic for partial aggr and seems not common.

For Proposed alternative

I have considered this approach: separating aggregate hash tables by mode and extracting shared internal utilities.

My concern is that this would create a shallow module. To fully understand the implementation, we would still need to reason about the same underlying mode differences, while the external AggregateHashTable variants would add complexity with almost no functional contribution.

So I think keeping the mode as an internal flag is simpler overall.

Complexity concern

For the concern about growing internal complexity, see the rationale in the issue. I think this should remain manageable in the long term.

Partial skip table

Yes, this is a valid concern. The partial skip table logic could be moved out of this module entirely to make the structure cleaner. However, I think this can be delayed a bit, and I plan to do it as a follow-up.
#23113

My concern is that this would create a shallow module. To fully understand the implementation, we would still need to reason about the same underlying mode differences, while the external AggregateHashTable variants would add complexity with almost no functional contribution.

One idea could be to leave the structure that is already on main, and refactor the common behavior out as templated functions on the base class

So for example, leave AggregateHashTable<AggrMode>

But then in the parts that are very similar like aggregate_batch instead of

impl AggregateHashTable<PartialReduceMarker> { ... pub(in crate::aggregates) fn aggregate_batch( &mut self, batch: &RecordBatch, ) -> Result<()> { let evaluated_batch = self.evaluate_batch(batch)?; let state = self.state.building_mut(); let timer = self.group_by_metrics.aggregation_time.timer(); for group_values in &evaluated_batch.grouping_set_args { state .group_values .intern(group_values, &mut state.batch_group_indices)?; let group_indices = &state.batch_group_indices; let total_num_groups = state.group_values.len(); for (acc, values) in state .accumulators .iter_mut() .zip(evaluated_batch.accumulator_args.iter()) { acc.merge_batch(values, group_indices, total_num_groups)?; } } drop(timer); Ok(()) } .. }

It could be structured something so that the specialization was called

impl AggregateHashTable<PartialReduceMarker> { ... pub(in crate::aggregates) fn aggregate_batch( &mut self, batch: &RecordBatch, ) -> Result<()> { // call a templated function with a closure with the hash table specific functionality self.aggregate_batch_inner(batch, |acc, values, group_indices, total_num_groups| { acc.merge_batch(values, group_indices, total_num_groups) })?; } ... }

And then move the common logic into the generic method

/// Implement in "base" class (not the specialization) impl AggregateHashTable<AggrMode> { ... fn <F: Fn(...) -> Result<()>)aggregate_batch_inner( batch: &RecordBatch, agg_function: F, ) -> Result<()> { let evaluated_batch = self.evaluate_batch(batch)?; let state = self.state.building_mut(); let timer = self.group_by_metrics.aggregation_time.timer(); for group_values in &evaluated_batch.grouping_set_args { state .group_values .intern(group_values, &mut state.batch_group_indices)?; let group_indices = &state.batch_group_indices; let total_num_groups = state.group_values.len(); for (acc, values) in state .accumulators .iter_mut() .zip(evaluated_batch.accumulator_args.iter()) { // *** Note here call the generic function agg_function: F agg_function(values, group_indices, total_num_groups)?; <--------- } } drop(timer); Ok(()) } } }

The templated-function approach seems like a good idea. I’ll give it a try and see how it looks.

If the templated function is not expressive enough in some places/at some point, we can still fully separate the implementation by aggregation variant.

2010YOUY01 · 2026-07-04T05:28:25Z

Thanks for the review @Rachelint and @alamb

If there are other ideas for making the structure simpler and easier to understand, I’d be happy to discuss and experiment further. I think managing complexity is a hard but important problem, and it’s interesting to figure out methodologies to address it.

refactor(hash-aggr): consolidate aggregate hash table

7f68c23

github-actions Bot added the physical-plan Changes to the physical-plan crate label Jul 3, 2026

2010YOUY01 mentioned this pull request Jul 3, 2026

feat(hash-aggr): Migrate Single/SinglePartitioned aggregation 2010YOUY01/arrow-datafusion#5

Open

alamb approved these changes Jul 3, 2026

View reviewed changes

Rachelint reviewed Jul 4, 2026

View reviewed changes

2010YOUY01 mentioned this pull request Jul 4, 2026

Simplify aggregate table for ordered cases #23318

Open

2010YOUY01 added 2 commits July 4, 2026 13:18

fix wrong link

77532c6

review: swap inner/outer loop for mode

6871dc7

Uh oh!

Conversation

2010YOUY01 commented Jul 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jul 3, 2026

Uh oh!

adriangbot commented Jul 3, 2026

Uh oh!

adriangbot commented Jul 3, 2026

Uh oh!

adriangbot commented Jul 3, 2026

Uh oh!

adriangbot commented Jul 3, 2026

Uh oh!

adriangbot commented Jul 3, 2026

Uh oh!

adriangbot commented Jul 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

For Proposed alternative

Complexity concern

Partial skip table

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2010YOUY01 Jul 4, 2026 •

edited

Loading