Skip to content

fix(telemetry): bound Prometheus distribution buffer (fork pin)#4663

Open
robacourt wants to merge 1 commit into
mainfrom
rob/pin-prometheus-core-fork
Open

fix(telemetry): bound Prometheus distribution buffer (fork pin)#4663
robacourt wants to merge 1 commit into
mainfrom
rob/pin-prometheus-core-fork

Conversation

@robacourt

Copy link
Copy Markdown
Contributor

Summary

Fixes unbounded ETS growth in the Prometheus metrics reporter. When ELECTRIC_PROMETHEUS_PORT is set but the /metrics endpoint is scraped infrequently — or never, as on OpenTelemetry-only deployments — telemetry_metrics_prometheus_core buffers one ETS row per distribution observation and only drains it on scrape. The per-transaction receive_lag distribution dominates, so the prometheus_metrics_dist table grows without bound. A customer hit an 8 GB+ ETS table and eventual OOM this way.

This pins telemetry_metrics_prometheus_core to a fork that bounds the buffer by aggregating automatically — on a size threshold (default 10k buffered samples) and a time fallback (default 60s) — in addition to on scrape, and serializes aggregation in the registry process (also fixing a pre-existing overlapping-scrape race).

Upstream PR: beam-telemetry/telemetry_metrics_prometheus_core#77

Why a git dep is safe here

The telemetry deps (the electric_telemetry path dep and its transitive telemetry_metrics_prometheus_core) are gated behind MIX_TARGET=application in sync-service/mix.exs (telemetry_deps(_) -> [] otherwise). The Hex-published electric package is built with the default target, so this dependency is absent from the published dep tree — the git pin cannot affect mix hex.publish. It only affects the standalone sync-service / Docker / CI build, where mix deps.get fetches git deps fine (git is installed in the builder image). electric-telemetry itself is not published to Hex.

Changes

  • packages/electric-telemetry/mix.exs — pin to the fork branch, with a comment to revert to ~> 1.2 once upstream lands on Hex.
  • packages/electric-telemetry/mix.lock + packages/sync-service/mix.lock — locked to the fork commit (only that one entry changes).
  • Changeset (@core/sync-service patch).

Verified electric-telemetry compiles against the fork. Revert this pin to the Hex release after #77 merges and ships.

🤖 Generated with Claude Code

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.20%. Comparing base (b66ebf7) to head (6987ca4).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4663      +/-   ##
==========================================
+ Coverage   60.03%   60.20%   +0.17%     
==========================================
  Files         395      410      +15     
  Lines       43747    44343     +596     
  Branches    12579    12583       +4     
==========================================
+ Hits        26262    26698     +436     
- Misses      17407    17567     +160     
  Partials       78       78              
Flag Coverage Δ
electric-telemetry 71.30% <ø> (?)
elixir 71.30% <ø> (?)
packages/agents 72.64% <ø> (ø)
packages/agents-mcp 77.70% <ø> (ø)
packages/agents-mobile 80.67% <ø> (ø)
packages/agents-runtime 83.78% <ø> (+0.06%) ⬆️
packages/agents-server 75.65% <ø> (ø)
packages/agents-server-ui 8.32% <ø> (ø)
packages/electric-ax 51.06% <ø> (ø)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 91.83% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 60.05% <ø> (+0.02%) ⬆️
unit-tests 60.20% <ø> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

When ELECTRIC_PROMETHEUS_PORT is set but the /metrics endpoint is scraped
infrequently or never (e.g. OTel-only deployments), telemetry_metrics_prometheus_core
buffers one ETS row per distribution observation and only drains on scrape, so
the dist table grows without bound (the per-transaction receive_lag metric
dominates). An 8GB+ ETS table and eventual OOM was observed in the field.

Pin telemetry_metrics_prometheus_core to a fork that bounds the buffer by
aggregating automatically on a size threshold (default 10k samples) and a time
fallback (default 60s), in addition to on scrape. Only affects the
MIX_TARGET=application build (the standalone sync-service / Docker image); the
telemetry deps are target-gated out of the Hex package, so this git dep does not
affect publishing of `electric`.

Upstream PR: beam-telemetry/telemetry_metrics_prometheus_core#77
Revert to the Hex release once it lands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YW7Njz5ZpBDaoGviW1eVR8
@robacourt robacourt force-pushed the rob/pin-prometheus-core-fork branch from bf688eb to 6987ca4 Compare June 29, 2026 16:54
@robacourt robacourt closed this Jun 30, 2026
@robacourt robacourt reopened this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant