diff --git a/README.md b/README.md index 45f7791..074ac30 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,12 @@ on the project site. Try it live on the Antalya demo cluster: **https://antalya. The [**ontime chart demo**](docs/ONTIME-CHART-DEMO.md) is a ready-made library of 10 queries (load [`examples/ontime-charts.json`](examples/ontime-charts.json) via **File ▾ → Open**) that walks through every chart type and feature against the public -`ontime` flight dataset. +`ontime` flight dataset. The [**system explorer demo**](docs/SYSTEM-EXPLORER-DEMO.md) +is a 14-query library (load [`examples/system-explorer-charts.json`](examples/system-explorer-charts.json) +via **File ▾ → Append**) that introspects ClickHouse's own `system` database — +running queries, merges/replication health, and historical query/part/error +activity — with a shared From/To filter driving every time-ranged Dashboard tile +at once. ## How it works diff --git a/docs/SYSTEM-EXPLORER-DEMO.md b/docs/SYSTEM-EXPLORER-DEMO.md new file mode 100644 index 0000000..0c1c541 --- /dev/null +++ b/docs/SYSTEM-EXPLORER-DEMO.md @@ -0,0 +1,90 @@ +# System explorer demo — introspecting ClickHouse itself + +A ready-made **Library** of 14 queries against ClickHouse's own `system` database — +running queries, merges/mutations/replication health, storage, and historical +query/part/error activity — running on the OSS `github.demo` cluster. Ideas and +query shapes are adapted from Mikhail Filimonov's +[ClickHouse ops Grafana dashboard](https://gist.github.com/filimonov/271e5b27c085356c67db3c1bf2204506) +(68 panels covering `metric_log`, `asynchronous_metric_log`, `query_log`, +`query_views_log`, `part_log`, and `error_log`) — not ported 1:1 (no Grafana +template macros, no per-cluster time-series for every background-pool metric), +just enough to show the shape of "explore your own cluster" as a Library + +Dashboard, not a full monitoring reimplementation. + +The six historical queries (#9–#14) share **one pair of query variables**, +`{from:String}`/`{to:String}` (parsed with `parseDateTimeBestEffort`), instead +of each hardcoding its own `now() - INTERVAL …` window. Same names everywhere +means the Dashboard's global filter bar (#149 D3) renders a single **From / +To** field pair that re-runs all six time-ranged tiles together when you type +a new range — one filter, six charts. + +- **Live demo:** **https://github.demo.altinity.cloud/sql** +- **The library file:** [`examples/system-explorer-charts.json`](../examples/system-explorer-charts.json) + ([raw download](https://raw.githubusercontent.com/Altinity/altinity-sql-browser/main/examples/system-explorer-charts.json)) +- **Reproduce it:** [`examples/build-system-explorer-charts.mjs`](../examples/build-system-explorer-charts.mjs) + regenerates the JSON (it derives each chart's schema key live via `DESCRIBE`- + equivalent `FORMAT JSON`, with throwaway `--param_from`/`--param_to` values + bound just so ClickHouse can resolve column types — the shipped SQL keeps + the placeholders unbound for the browser to fill in). + +## Load it (≈30 seconds) + +1. Open **https://github.demo.altinity.cloud/sql** and sign in (**Continue with + GitHub** via Auth0, or use the credentials box — see + [LOGIN-SCREEN.md](LOGIN-SCREEN.md) for what each login path grants). +2. Download [`system-explorer-charts.json`](https://raw.githubusercontent.com/Altinity/altinity-sql-browser/main/examples/system-explorer-charts.json) + (right-click → Save link as…). +3. In the header, click **File ▾ → Append…** and pick the file (Append merges + into whatever's already in your Library, reporting `Added N`; use **Open…** + instead if you'd rather replace the whole Library). Eight of the fourteen + queries import already **favorited**. +4. Click **File ▾ → "Open as dashboard"** (or the Dashboard link in the + sidebar). Two KPI-less tiles (#6–8, live snapshots) render immediately; + the six time-ranged tiles (#9–13, minus the table-only #14) show an "Enter + a value for: from, to" placeholder until you type a range into the + dashboard's **From / To** filter fields — then all six re-run together. + `parseDateTimeBestEffort` accepts most absolute formats; e.g. From + `2026-07-01 00:00:00`, To `2026-07-05 00:00:00` (there's no relative + `now`/`today` shorthand — type a real timestamp for "to" as well). + +## What each query demonstrates + +| # | Query | View | What it shows | +|---|-------|------|----------------| +| 1 | Currently running queries | Table | `system.processes` live snapshot — often empty; that's a real result | +| 2 | Merges in progress | Table | `system.merges` — background merge progress + size | +| 3 | Mutations in progress | Table | `system.mutations WHERE NOT is_done`, with failure reason | +| 4 | Replication status | Table | `system.replicas` — delay, queue depth, leadership | +| 5 | Stuck replication queue entries | Table | `system.replication_queue WHERE num_tries > 0` | +| 6 | Largest tables by disk usage | Bar (horizontal) | `system.parts` summed per table | +| 7 | Active parts by table | Bar (horizontal) | part *count* per table — an early "too many parts" signal | +| 8 | Cumulative error counters | Bar (horizontal) | `system.errors` — every error code hit since restart | +| 9 | Queries per minute | Line | `system.query_log` bucketed per minute over `{from}`/`{to}`; `DateTime` axis auto-detected as time | +| 10 | Slowest query patterns — avg duration | Bar (horizontal) | `query_log` over `{from}`/`{to}`, grouped by `normalized_query_hash`, a non-count measure | +| 11 | Query errors over time | Grouped bars | `query_log` failures over `{from}`/`{to}`, **Series** = error name | +| 12 | Part lifecycle events over time | Grouped bars | `part_log` over `{from}`/`{to}`, **Series** = `event_type` (an `Enum8` column) | +| 13 | Memory usage over time | Line | `system.metric_log`'s `CurrentMetric_MemoryTracking` over `{from}`/`{to}`, averaged per minute | +| 14 | Query cost breakdown — slowest patterns (detail) | Table | the deep-dive version of #10 over `{from}`/`{to}`: executions, rows/bytes read, p99 memory | + +Rows 1–5 and 14 need `SELECT` on the relevant `system.*` table; rows 6–13 also +read `system.query_log`/`system.part_log`/`system.metric_log`, which most +demo/read-only users won't have — sign in with an account that has broader +`system` grants (or run it against your own cluster as an admin) to see all +fourteen populate. + +## Direct links + +Every chartable query is also reachable as a single shareable link — open one +and the SQL **and** its chart configuration are pre-loaded. Rows 6–8 need only +**Run**, then the **Chart** tab; rows 9–13 also need `from`/`to` values typed +into the variable strip below the editor before Run is enabled (the link +itself can't carry a variable *value*, only the query). + +- **Bar** — [Largest tables by disk usage](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIGNvbmNhdChkYXRhYmFzZSwgJy4nLCB0YWJsZSkgQVMgdGFibGUsIHN1bShieXRlc19vbl9kaXNrKSBBUyBkaXNrX2J5dGVzXG5GUk9NIHN5c3RlbS5wYXJ0c1xuV0hFUkUgYWN0aXZlXG5HUk9VUCBCWSBkYXRhYmFzZSwgdGFibGVcbk9SREVSIEJZIGRpc2tfYnl0ZXMgREVTQ1xuTElNSVQgMTUiLCJjaGFydCI6eyJjZmciOnsidHlwZSI6ImhiYXIiLCJ4IjowLCJ5IjpbMV0sInNlcmllcyI6bnVsbH0sImtleSI6InRhYmxlOlN0cmluZ3xkaXNrX2J5dGVzOlVJbnQ2NCJ9fQ==) +- **Bar** — [Active parts by table](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIGNvbmNhdChkYXRhYmFzZSwgJy4nLCB0YWJsZSkgQVMgdGFibGUsIGNvdW50KCkgQVMgcGFydHNcbkZST00gc3lzdGVtLnBhcnRzXG5XSEVSRSBhY3RpdmVcbkdST1VQIEJZIGRhdGFiYXNlLCB0YWJsZVxuT1JERVIgQlkgcGFydHMgREVTQ1xuTElNSVQgMTUiLCJjaGFydCI6eyJjZmciOnsidHlwZSI6ImhiYXIiLCJ4IjowLCJ5IjpbMV0sInNlcmllcyI6bnVsbH0sImtleSI6InRhYmxlOlN0cmluZ3xwYXJ0czpVSW50NjQifX0=) +- **Bar** — [Cumulative error counters](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIG5hbWUsIHZhbHVlIEFTIHRpbWVzXG5GUk9NIHN5c3RlbS5lcnJvcnNcbldIRVJFIHZhbHVlID4gMFxuT1JERVIgQlkgdmFsdWUgREVTQ1xuTElNSVQgMTUiLCJjaGFydCI6eyJjZmciOnsidHlwZSI6ImhiYXIiLCJ4IjowLCJ5IjpbMV0sInNlcmllcyI6bnVsbH0sImtleSI6Im5hbWU6U3RyaW5nfHRpbWVzOlVJbnQ2NCJ9fQ==) +- **Line** — [Queries per minute](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIHRvU3RhcnRPZk1pbnV0ZShldmVudF90aW1lKSBBUyB0LCBjb3VudCgpIEFTIHF1ZXJpZXNcbkZST00gc3lzdGVtLnF1ZXJ5X2xvZ1xuV0hFUkUgZXZlbnRfdGltZSBCRVRXRUVOIHBhcnNlRGF0ZVRpbWVCZXN0RWZmb3J0KHtmcm9tOlN0cmluZ30pIEFORCBwYXJzZURhdGVUaW1lQmVzdEVmZm9ydCh7dG86U3RyaW5nfSkgQU5EIHR5cGUgPSAnUXVlcnlGaW5pc2gnXG5HUk9VUCBCWSB0XG5PUkRFUiBCWSB0IiwiY2hhcnQiOnsiY2ZnIjp7InR5cGUiOiJsaW5lIiwieCI6MCwieSI6WzFdLCJzZXJpZXMiOm51bGx9LCJrZXkiOiJ0OkRhdGVUaW1lfHF1ZXJpZXM6VUludDY0In19) +- **Bar** — [Slowest query patterns — avg duration](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIGxlZnQoYW55KHF1ZXJ5KSwgNTApIEFTIHF1ZXJ5LCBhdmcocXVlcnlfZHVyYXRpb25fbXMpIEFTIGF2Z19kdXJhdGlvbl9tc1xuRlJPTSBzeXN0ZW0ucXVlcnlfbG9nXG5XSEVSRSBldmVudF90aW1lIEJFVFdFRU4gcGFyc2VEYXRlVGltZUJlc3RFZmZvcnQoe2Zyb206U3RyaW5nfSkgQU5EIHBhcnNlRGF0ZVRpbWVCZXN0RWZmb3J0KHt0bzpTdHJpbmd9KSBBTkQgdHlwZSA9ICdRdWVyeUZpbmlzaCdcbkdST1VQIEJZIG5vcm1hbGl6ZWRfcXVlcnlfaGFzaFxuT1JERVIgQlkgYXZnX2R1cmF0aW9uX21zIERFU0NcbkxJTUlUIDE1IiwiY2hhcnQiOnsiY2ZnIjp7InR5cGUiOiJoYmFyIiwieCI6MCwieSI6WzFdLCJzZXJpZXMiOm51bGx9LCJrZXkiOiJxdWVyeTpTdHJpbmd8YXZnX2R1cmF0aW9uX21zOkZsb2F0NjQifX0=) +- **Grouped bars** — [Query errors over time](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIHRvU3RhcnRPZkhvdXIoZXZlbnRfdGltZSkgQVMgdCwgZXJyb3JDb2RlVG9OYW1lKGV4Y2VwdGlvbl9jb2RlKSBBUyBlcnJvciwgY291bnQoKSBBUyBuXG5GUk9NIHN5c3RlbS5xdWVyeV9sb2dcbldIRVJFIGV2ZW50X3RpbWUgQkVUV0VFTiBwYXJzZURhdGVUaW1lQmVzdEVmZm9ydCh7ZnJvbTpTdHJpbmd9KSBBTkQgcGFyc2VEYXRlVGltZUJlc3RFZmZvcnQoe3RvOlN0cmluZ30pIEFORCBleGNlcHRpb25fY29kZSAhPSAwXG5HUk9VUCBCWSB0LCBlcnJvclxuT1JERVIgQlkgdCIsImNoYXJ0Ijp7ImNmZyI6eyJ0eXBlIjoiYmFyIiwieCI6MCwieSI6WzJdLCJzZXJpZXMiOjF9LCJrZXkiOiJ0OkRhdGVUaW1lfGVycm9yOkxvd0NhcmRpbmFsaXR5KFN0cmluZyl8bjpVSW50NjQifX0=) +- **Grouped bars** — [Part lifecycle events over time](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIHRvU3RhcnRPZkhvdXIoZXZlbnRfdGltZSkgQVMgdCwgZXZlbnRfdHlwZSwgY291bnQoKSBBUyBuXG5GUk9NIHN5c3RlbS5wYXJ0X2xvZ1xuV0hFUkUgZXZlbnRfdGltZSBCRVRXRUVOIHBhcnNlRGF0ZVRpbWVCZXN0RWZmb3J0KHtmcm9tOlN0cmluZ30pIEFORCBwYXJzZURhdGVUaW1lQmVzdEVmZm9ydCh7dG86U3RyaW5nfSlcbkdST1VQIEJZIHQsIGV2ZW50X3R5cGVcbk9SREVSIEJZIHQiLCJjaGFydCI6eyJjZmciOnsidHlwZSI6ImJhciIsIngiOjAsInkiOlsyXSwic2VyaWVzIjoxfSwia2V5IjoidDpEYXRlVGltZXxldmVudF90eXBlOkVudW04KCdOZXdQYXJ0JyA9IDEsICdNZXJnZVBhcnRzJyA9IDIsICdEb3dubG9hZFBhcnQnID0gMywgJ1JlbW92ZVBhcnQnID0gNCwgJ011dGF0ZVBhcnQnID0gNSwgJ01vdmVQYXJ0JyA9IDYsICdNZXJnZVBhcnRzU3RhcnQnID0gNywgJ011dGF0ZVBhcnRTdGFydCcgPSA4KXxuOlVJbnQ2NCJ9fQ==) +- **Line** — [Memory usage over time](https://github.demo.altinity.cloud/sql#eyJfX2FzYiI6MSwic3FsIjoiU0VMRUNUIHRvU3RhcnRPZk1pbnV0ZShldmVudF90aW1lKSBBUyB0LCBhdmcoQ3VycmVudE1ldHJpY19NZW1vcnlUcmFja2luZykgQVMgbWVtb3J5X2J5dGVzXG5GUk9NIHN5c3RlbS5tZXRyaWNfbG9nXG5XSEVSRSBldmVudF90aW1lIEJFVFdFRU4gcGFyc2VEYXRlVGltZUJlc3RFZmZvcnQoe2Zyb206U3RyaW5nfSkgQU5EIHBhcnNlRGF0ZVRpbWVCZXN0RWZmb3J0KHt0bzpTdHJpbmd9KVxuR1JPVVAgQlkgdFxuT1JERVIgQlkgdCIsImNoYXJ0Ijp7ImNmZyI6eyJ0eXBlIjoibGluZSIsIngiOjAsInkiOlsxXSwic2VyaWVzIjpudWxsfSwia2V5IjoidDpEYXRlVGltZXxtZW1vcnlfYnl0ZXM6RmxvYXQ2NCJ9fQ==) diff --git a/docs/local-app.html b/docs/local-app.html index 7890e63..8ccfdda 100644 --- a/docs/local-app.html +++ b/docs/local-app.html @@ -149,7 +149,9 @@

Bundled demo endpoints

altinity-demo

Altinity github.demo cluster.

Want a guided tour? Load the ontime chart demo — - ten ready-made queries that walk through every chart type.

+ ten ready-made queries that walk through every chart type — or the + system explorer demo, + fourteen queries that introspect ClickHouse's own system database.

diff --git a/examples/build-system-explorer-charts.mjs b/examples/build-system-explorer-charts.mjs new file mode 100644 index 0000000..42ac09f --- /dev/null +++ b/examples/build-system-explorer-charts.mjs @@ -0,0 +1,251 @@ +// Generator for examples/system-explorer-charts.json — a saved-queries +// "Library" file for the Altinity SQL Browser that explores ClickHouse's own +// system database: currently-running work, merges/mutations/replication +// health, storage, and historical query/part/error activity from the *_log +// tables. Ideas and query shapes are adapted (not ported 1:1 — no Grafana +// template macros) from Mikhail Filimonov's ClickHouse ops dashboard: +// https://gist.github.com/filimonov/271e5b27c085356c67db3c1bf2204506 +// +// Why a generator: the browser only restores a saved chart config when the +// entry's `chart.key` exactly equals schemaKey(resultColumns) = "name:type|…" +// (see src/ui/results.js chartCfgFor / src/core/chart-data.js schemaKey). +// Hand-writing those type strings is error-prone (Enum8/LowCardinality wrap +// exactly), so we derive each key live from `DESCRIBE ()` against a +// real cluster, read through FORMAT JSON so the type string matches exactly +// what the app's HTTP+JSON interface receives (clickhouse-client's default +// TSV output escapes embedded quotes differently and will silently produce a +// key that never matches at runtime). +// +// A handful of entries are plain live-snapshot tables (system.processes, +// system.merges, …) with no chart — those need SELECT privilege on the +// underlying table but no query history, and are commonly close to empty on +// an idle cluster (that's a legitimate result, not a bug). +// +// Every time-ranged *_log query shares two ClickHouse native query parameters, +// `{from:String}`/`{to:String}` (parsed via parseDateTimeBestEffort), instead +// of a hardcoded `now() - INTERVAL …`. Same param names across every entry +// means the Dashboard's global filter bar (#149 D3) renders ONE From/To pair +// that drives all six time-ranged tiles at once. DESCRIBE can't resolve an +// unbound parameter, so schemaKey() below binds throwaway test values via +// `--param_from`/`--param_to` purely to derive column types — the *shipped* +// SQL keeps the placeholders unbound for the browser to fill in. +// +// Run: node examples/build-system-explorer-charts.mjs [connection-name] +// Needs a `clickhouse-client` connection with SELECT on system.* — NOT the +// narrow "demo" fixture user some clusters expose by default (it can't read +// system.processes/query_log/etc). Defaults to `github-admin`; this file was +// authored against the github.demo cluster via +// kubectl exec chi-github-github-0-0-0 -c clickhouse-pod -- +// clickhouse-client --user clickhouse_operator --password "$PASS" ... +// since no adequately-privileged named CLI connection existed in that session. +// Out: examples/system-explorer-charts.json + +import { execFileSync } from 'node:child_process'; +import { writeFileSync } from 'node:fs'; +import { fileURLToPath } from 'node:url'; +import { dirname, resolve } from 'node:path'; + +const here = dirname(fileURLToPath(import.meta.url)); +const CONNECTION = process.argv[2] || 'github-admin'; + +// Each spec: a query + (for chartable ones) the chart we want it to open +// with. `cfg` matches the app's shape { type, x, y:[...], series }; x/series +// are column indices, y a list of measure-column indices. A spec with no +// `cfg` is a live-snapshot table — no chart, opens in Table view. +const SPECS = [ + { + name: 'Currently running queries', + description: 'Live snapshot of system.processes — every query executing right now, slowest first. Empty when the cluster is idle; that\'s a real "nothing running" result, not an error.', + sql: `SELECT query_id, user, elapsed, read_rows, formatReadableSize(memory_usage) AS memory, left(query, 80) AS query +FROM system.processes +ORDER BY elapsed DESC +LIMIT 20`, + }, + { + name: 'Merges in progress', + description: 'Live snapshot of system.merges — background merges currently running, with progress and compressed size. Usually empty between merge cycles on a small cluster.', + sql: `SELECT database, table, elapsed, round(progress * 100, 1) AS pct_done, num_parts, is_mutation, formatReadableSize(total_size_bytes_compressed) AS size +FROM system.merges +ORDER BY elapsed DESC +LIMIT 20`, + }, + { + name: 'Mutations in progress', + description: 'Unfinished ALTER UPDATE/DELETE mutations from system.mutations, with the failure reason if one is stuck retrying.', + sql: `SELECT database, table, mutation_id, command, parts_to_do, latest_fail_reason +FROM system.mutations +WHERE NOT is_done +ORDER BY create_time +LIMIT 20`, + }, + { + name: 'Replication status', + description: 'system.replicas health per table — leadership, read-only state, replication delay, and queue depth. Sorted worst-lag-first.', + sql: `SELECT database, table, is_leader, is_readonly, absolute_delay, queue_size, inserts_in_queue, merges_in_queue +FROM system.replicas +ORDER BY absolute_delay DESC +LIMIT 20`, + }, + { + name: 'Stuck replication queue entries', + description: 'system.replication_queue entries that have already failed and retried at least once, with the last exception — the first place to look when a replica falls behind.', + sql: `SELECT database, table, type, create_time, num_tries, last_exception +FROM system.replication_queue +WHERE num_tries > 0 +ORDER BY num_tries DESC +LIMIT 20`, + }, + { + name: 'Largest tables by disk usage', + description: 'Every active part in system.parts, summed per table, largest first. Horizontal Bar — hover any bar for the exact byte count.', + cfg: { type: 'hbar', x: 0, y: [1], series: null }, + sql: `SELECT concat(database, '.', table) AS table, sum(bytes_on_disk) AS disk_bytes +FROM system.parts +WHERE active +GROUP BY database, table +ORDER BY disk_bytes DESC +LIMIT 15`, + }, + { + name: 'Active parts by table', + description: 'Active part *count* per table (not size) — a table climbing here between refreshes is trending toward "too many parts". Horizontal Bar.', + cfg: { type: 'hbar', x: 0, y: [1], series: null }, + sql: `SELECT concat(database, '.', table) AS table, count() AS parts +FROM system.parts +WHERE active +GROUP BY database, table +ORDER BY parts DESC +LIMIT 15`, + }, + { + name: 'Cumulative error counters', + description: 'system.errors — every error code the server has hit since last restart, most frequent first. A quick "what\'s actually going wrong here" check. Horizontal Bar.', + cfg: { type: 'hbar', x: 0, y: [1], series: null }, + sql: `SELECT name, value AS times +FROM system.errors +WHERE value > 0 +ORDER BY value DESC +LIMIT 15`, + }, + { + name: 'Queries per minute', + description: 'Finished-query volume from system.query_log, bucketed per minute, over a {from:String}/{to:String} range (shared with every other time-ranged query below — the Dashboard filter bar renders one From/To pair that drives them all). A DateTime X axis is auto-detected as a time series → Line chart.', + cfg: { type: 'line', x: 0, y: [1], series: null }, + sql: `SELECT toStartOfMinute(event_time) AS t, count() AS queries +FROM system.query_log +WHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND type = 'QueryFinish' +GROUP BY t +ORDER BY t`, + }, + { + name: 'Slowest query patterns — avg duration', + description: 'Distinct query shapes (system.query_log grouped by normalized_query_hash) ranked by average duration over a {from:String}/{to:String} range. Horizontal Bar of a non-count measure.', + cfg: { type: 'hbar', x: 0, y: [1], series: null }, + sql: `SELECT left(any(query), 50) AS query, avg(query_duration_ms) AS avg_duration_ms +FROM system.query_log +WHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND type = 'QueryFinish' +GROUP BY normalized_query_hash +ORDER BY avg_duration_ms DESC +LIMIT 15`, + }, + { + name: 'Query errors over time', + description: 'Failed queries from system.query_log over a {from:String}/{to:String} range, broken down by ClickHouse error name. The "error" column is used as the Series, producing grouped/stacked bars per error code.', + cfg: { type: 'bar', x: 0, y: [2], series: 1 }, + sql: `SELECT toStartOfHour(event_time) AS t, errorCodeToName(exception_code) AS error, count() AS n +FROM system.query_log +WHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND exception_code != 0 +GROUP BY t, error +ORDER BY t`, + }, + { + name: 'Part lifecycle events over time', + description: 'system.part_log over a {from:String}/{to:String} range — new/merged/mutated/downloaded/removed parts per hour, one query instead of five separate panels. "event_type" is the Series.', + cfg: { type: 'bar', x: 0, y: [2], series: 1 }, + sql: `SELECT toStartOfHour(event_time) AS t, event_type, count() AS n +FROM system.part_log +WHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) +GROUP BY t, event_type +ORDER BY t`, + }, + { + name: 'Memory usage over time', + description: 'Average tracked memory (system.metric_log\'s CurrentMetric_MemoryTracking) per minute over a {from:String}/{to:String} range. Line chart.', + cfg: { type: 'line', x: 0, y: [1], series: null }, + sql: `SELECT toStartOfMinute(event_time) AS t, avg(CurrentMetric_MemoryTracking) AS memory_bytes +FROM system.metric_log +WHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) +GROUP BY t +ORDER BY t`, + }, + { + name: 'Query cost breakdown — slowest patterns (detail)', + description: 'The deep-dive version of "Slowest query patterns": executions, max/avg duration, rows and bytes read, and p99 memory per query shape over a {from:String}/{to:String} range. Table view — too many columns for one chart, but the full picture behind the bar chart above.', + sql: `SELECT + normalized_query_hash, + left(argMax(query, query_duration_ms), 60) AS sample_query, + count() AS executions, + max(query_duration_ms) AS max_ms, + avg(query_duration_ms) AS avg_ms, + sum(read_rows) AS read_rows, + formatReadableSize(sum(read_bytes)) AS read_bytes, + quantile(0.99)(memory_usage) AS p99_memory +FROM system.query_log +WHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND type = 'QueryFinish' +GROUP BY normalized_query_hash +ORDER BY avg_ms DESC +LIMIT 15`, + }, +]; + +// Throwaway values just to let DESCRIBE/FORMAT JSON resolve column types for +// queries that reference {from:String}/{to:String} — never shipped in the +// output; the JSON's `sql` keeps the placeholders unbound. +const TEST_FROM = '2026-07-01 00:00:00'; +const TEST_TO = '2026-07-08 00:00:00'; + +const ch = (query) => + execFileSync('clickhouse-client', [ + '--connection', CONNECTION, + '--param_from', TEST_FROM, + '--param_to', TEST_TO, + '--query', query, + ], { + encoding: 'utf8', + maxBuffer: 64 * 1024 * 1024, + }); + +// schemaKey == columns.map(c => c.name + ':' + c.type).join('|'), derived via +// FORMAT JSON (not clickhouse-client's default TSV, which escapes embedded +// quotes in e.g. an Enum8(...) type string differently from the HTTP+JSON +// interface the app actually uses) so it matches exactly what the browser +// receives at run time. +function schemaKey(sql) { + const out = JSON.parse(ch(`SELECT * FROM (${sql}) LIMIT 1 FORMAT JSON`)); + return out.meta.map((m) => `${m.name}:${m.type}`).join('|'); +} + +const queries = SPECS.map((s, i) => { + const base = { + id: 'sys-' + (i + 1), + name: s.name, + sql: s.sql, + favorite: !!s.cfg, + description: s.description, + }; + if (!s.cfg) return base; + const key = schemaKey(s.sql); + console.log(`#${i + 1} ${s.cfg.type.padEnd(4)} key=${key}`); + return { ...base, chart: { cfg: s.cfg, key }, view: 'chart' }; +}); + +const doc = { + format: 'altinity-sql-browser/saved-queries', + version: 1, + exportedAt: new Date().toISOString(), + queries, +}; + +const outPath = resolve(here, 'system-explorer-charts.json'); +writeFileSync(outPath, JSON.stringify(doc, null, 2) + '\n'); +console.log(`\nwrote ${outPath} (${queries.length} queries, ${queries.filter((q) => q.favorite).length} favorited for the Dashboard)`); diff --git a/examples/system-explorer-charts.json b/examples/system-explorer-charts.json new file mode 100644 index 0000000..bd7737b --- /dev/null +++ b/examples/system-explorer-charts.json @@ -0,0 +1,201 @@ +{ + "format": "altinity-sql-browser/saved-queries", + "version": 1, + "exportedAt": "2026-07-04T19:47:55.982Z", + "queries": [ + { + "id": "sys-1", + "name": "Currently running queries", + "sql": "SELECT query_id, user, elapsed, read_rows, formatReadableSize(memory_usage) AS memory, left(query, 80) AS query\nFROM system.processes\nORDER BY elapsed DESC\nLIMIT 20", + "favorite": false, + "description": "Live snapshot of system.processes — every query executing right now, slowest first. Empty when the cluster is idle; that's a real \"nothing running\" result, not an error." + }, + { + "id": "sys-2", + "name": "Merges in progress", + "sql": "SELECT database, table, elapsed, round(progress * 100, 1) AS pct_done, num_parts, is_mutation, formatReadableSize(total_size_bytes_compressed) AS size\nFROM system.merges\nORDER BY elapsed DESC\nLIMIT 20", + "favorite": false, + "description": "Live snapshot of system.merges — background merges currently running, with progress and compressed size. Usually empty between merge cycles on a small cluster." + }, + { + "id": "sys-3", + "name": "Mutations in progress", + "sql": "SELECT database, table, mutation_id, command, parts_to_do, latest_fail_reason\nFROM system.mutations\nWHERE NOT is_done\nORDER BY create_time\nLIMIT 20", + "favorite": false, + "description": "Unfinished ALTER UPDATE/DELETE mutations from system.mutations, with the failure reason if one is stuck retrying." + }, + { + "id": "sys-4", + "name": "Replication status", + "sql": "SELECT database, table, is_leader, is_readonly, absolute_delay, queue_size, inserts_in_queue, merges_in_queue\nFROM system.replicas\nORDER BY absolute_delay DESC\nLIMIT 20", + "favorite": false, + "description": "system.replicas health per table — leadership, read-only state, replication delay, and queue depth. Sorted worst-lag-first." + }, + { + "id": "sys-5", + "name": "Stuck replication queue entries", + "sql": "SELECT database, table, type, create_time, num_tries, last_exception\nFROM system.replication_queue\nWHERE num_tries > 0\nORDER BY num_tries DESC\nLIMIT 20", + "favorite": false, + "description": "system.replication_queue entries that have already failed and retried at least once, with the last exception — the first place to look when a replica falls behind." + }, + { + "id": "sys-6", + "name": "Largest tables by disk usage", + "sql": "SELECT concat(database, '.', table) AS table, sum(bytes_on_disk) AS disk_bytes\nFROM system.parts\nWHERE active\nGROUP BY database, table\nORDER BY disk_bytes DESC\nLIMIT 15", + "favorite": true, + "description": "Every active part in system.parts, summed per table, largest first. Horizontal Bar — hover any bar for the exact byte count.", + "chart": { + "cfg": { + "type": "hbar", + "x": 0, + "y": [ + 1 + ], + "series": null + }, + "key": "table:String|disk_bytes:UInt64" + }, + "view": "chart" + }, + { + "id": "sys-7", + "name": "Active parts by table", + "sql": "SELECT concat(database, '.', table) AS table, count() AS parts\nFROM system.parts\nWHERE active\nGROUP BY database, table\nORDER BY parts DESC\nLIMIT 15", + "favorite": true, + "description": "Active part *count* per table (not size) — a table climbing here between refreshes is trending toward \"too many parts\". Horizontal Bar.", + "chart": { + "cfg": { + "type": "hbar", + "x": 0, + "y": [ + 1 + ], + "series": null + }, + "key": "table:String|parts:UInt64" + }, + "view": "chart" + }, + { + "id": "sys-8", + "name": "Cumulative error counters", + "sql": "SELECT name, value AS times\nFROM system.errors\nWHERE value > 0\nORDER BY value DESC\nLIMIT 15", + "favorite": true, + "description": "system.errors — every error code the server has hit since last restart, most frequent first. A quick \"what's actually going wrong here\" check. Horizontal Bar.", + "chart": { + "cfg": { + "type": "hbar", + "x": 0, + "y": [ + 1 + ], + "series": null + }, + "key": "name:String|times:UInt64" + }, + "view": "chart" + }, + { + "id": "sys-9", + "name": "Queries per minute", + "sql": "SELECT toStartOfMinute(event_time) AS t, count() AS queries\nFROM system.query_log\nWHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND type = 'QueryFinish'\nGROUP BY t\nORDER BY t", + "favorite": true, + "description": "Finished-query volume from system.query_log, bucketed per minute, over a {from:String}/{to:String} range (shared with every other time-ranged query below — the Dashboard filter bar renders one From/To pair that drives them all). A DateTime X axis is auto-detected as a time series → Line chart.", + "chart": { + "cfg": { + "type": "line", + "x": 0, + "y": [ + 1 + ], + "series": null + }, + "key": "t:DateTime|queries:UInt64" + }, + "view": "chart" + }, + { + "id": "sys-10", + "name": "Slowest query patterns — avg duration", + "sql": "SELECT left(any(query), 50) AS query, avg(query_duration_ms) AS avg_duration_ms\nFROM system.query_log\nWHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND type = 'QueryFinish'\nGROUP BY normalized_query_hash\nORDER BY avg_duration_ms DESC\nLIMIT 15", + "favorite": true, + "description": "Distinct query shapes (system.query_log grouped by normalized_query_hash) ranked by average duration over a {from:String}/{to:String} range. Horizontal Bar of a non-count measure.", + "chart": { + "cfg": { + "type": "hbar", + "x": 0, + "y": [ + 1 + ], + "series": null + }, + "key": "query:String|avg_duration_ms:Float64" + }, + "view": "chart" + }, + { + "id": "sys-11", + "name": "Query errors over time", + "sql": "SELECT toStartOfHour(event_time) AS t, errorCodeToName(exception_code) AS error, count() AS n\nFROM system.query_log\nWHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND exception_code != 0\nGROUP BY t, error\nORDER BY t", + "favorite": true, + "description": "Failed queries from system.query_log over a {from:String}/{to:String} range, broken down by ClickHouse error name. The \"error\" column is used as the Series, producing grouped/stacked bars per error code.", + "chart": { + "cfg": { + "type": "bar", + "x": 0, + "y": [ + 2 + ], + "series": 1 + }, + "key": "t:DateTime|error:LowCardinality(String)|n:UInt64" + }, + "view": "chart" + }, + { + "id": "sys-12", + "name": "Part lifecycle events over time", + "sql": "SELECT toStartOfHour(event_time) AS t, event_type, count() AS n\nFROM system.part_log\nWHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String})\nGROUP BY t, event_type\nORDER BY t", + "favorite": true, + "description": "system.part_log over a {from:String}/{to:String} range — new/merged/mutated/downloaded/removed parts per hour, one query instead of five separate panels. \"event_type\" is the Series.", + "chart": { + "cfg": { + "type": "bar", + "x": 0, + "y": [ + 2 + ], + "series": 1 + }, + "key": "t:DateTime|event_type:Enum8('NewPart' = 1, 'MergeParts' = 2, 'DownloadPart' = 3, 'RemovePart' = 4, 'MutatePart' = 5, 'MovePart' = 6, 'MergePartsStart' = 7, 'MutatePartStart' = 8)|n:UInt64" + }, + "view": "chart" + }, + { + "id": "sys-13", + "name": "Memory usage over time", + "sql": "SELECT toStartOfMinute(event_time) AS t, avg(CurrentMetric_MemoryTracking) AS memory_bytes\nFROM system.metric_log\nWHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String})\nGROUP BY t\nORDER BY t", + "favorite": true, + "description": "Average tracked memory (system.metric_log's CurrentMetric_MemoryTracking) per minute over a {from:String}/{to:String} range. Line chart.", + "chart": { + "cfg": { + "type": "line", + "x": 0, + "y": [ + 1 + ], + "series": null + }, + "key": "t:DateTime|memory_bytes:Float64" + }, + "view": "chart" + }, + { + "id": "sys-14", + "name": "Query cost breakdown — slowest patterns (detail)", + "sql": "SELECT\n normalized_query_hash,\n left(argMax(query, query_duration_ms), 60) AS sample_query,\n count() AS executions,\n max(query_duration_ms) AS max_ms,\n avg(query_duration_ms) AS avg_ms,\n sum(read_rows) AS read_rows,\n formatReadableSize(sum(read_bytes)) AS read_bytes,\n quantile(0.99)(memory_usage) AS p99_memory\nFROM system.query_log\nWHERE event_time BETWEEN parseDateTimeBestEffort({from:String}) AND parseDateTimeBestEffort({to:String}) AND type = 'QueryFinish'\nGROUP BY normalized_query_hash\nORDER BY avg_ms DESC\nLIMIT 15", + "favorite": false, + "description": "The deep-dive version of \"Slowest query patterns\": executions, max/avg duration, rows and bytes read, and p99 memory per query shape over a {from:String}/{to:String} range. Table view — too many columns for one chart, but the full picture behind the bar chart above." + } + ] +}