fix(backend): handle Gitea ERR_STREAM_PREMATURE_CLOSE during sync#1405
fix(backend): handle Gitea ERR_STREAM_PREMATURE_CLOSE during sync#1405brendan-kellam wants to merge 3 commits into
Conversation
Force identity encoding / connection close in the Gitea API fetch to avoid cross-fetch failing to read the response body, and guard against null repo data so a failed fetch no longer crashes on repo.full_name. Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
WalkthroughThis PR fixes a Gitea sync failure caused by ChangesGitea sync reliability fix
Sequence Diagram(s)sequenceDiagram
participant SyncJob
participant giteaApi
participant customFetch
participant GiteaServer
SyncJob->>giteaApi: repoGet(org, repo)
giteaApi->>customFetch: fetch request
customFetch->>GiteaServer: HTTP request (Accept-Encoding identity, Connection close)
GiteaServer-->>customFetch: response body
customFetch-->>giteaApi: response
giteaApi-->>SyncJob: HttpResponse(data, error)
alt error present or data missing
SyncJob->>SyncJob: throw error
else valid data
SyncJob->>SyncJob: filter null/undefined repos, log warning
SyncJob->>SyncJob: continue aggregation
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
…rr_stream_premature_close-75d2
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/backend/src/gitea.ts`:
- Around line 230-233: The paginated repo listing paths in
getReposOwnedByUsers() and getReposForOrgs() are missing the same
empty/error-response guard used elsewhere, so a premature-close from paginate()
can bubble up as an opaque failure. Update the pagination handling in these
functions in gitea.ts to explicitly check for response.error or missing
response.data after each page fetch, and throw a clear error with the relevant
repo/org context before mapping or accumulating results.
- Around line 15-29: The custom fetch wrapper in gitea.ts is forcing Connection:
close for every request, which breaks keep-alive reuse during paginated
org/repo/user syncs. Update customFetch to apply this workaround only for
affected Gitea hosts or error-prone cases, or switch to a keep-alive agent while
still preserving the Accept-Encoding identity workaround. Keep the change
localized to customFetch so the rest of the Gitea request flow continues using
normal connection reuse.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 128bae05-8209-4db2-b9bc-93dcd60bd6ba
📒 Files selected for processing (2)
CHANGELOG.mdpackages/backend/src/gitea.ts
| // Some Gitea instances (particularly when behind certain reverse proxies or with | ||
| // response compression enabled) cause `cross-fetch` to fail while reading the | ||
| // response body with ERR_STREAM_PREMATURE_CLOSE. Forcing identity encoding and | ||
| // closing the connection avoids the premature close. | ||
| // @see https://github.com/sourcebot-dev/sourcebot/issues/1404 | ||
| const customFetch: typeof fetch = (url, options = {}) => { | ||
| return fetch(url, { | ||
| ...options, | ||
| headers: { | ||
| ...(options.headers ?? {}), | ||
| 'Accept-Encoding': 'identity', | ||
| 'Connection': 'close', | ||
| }, | ||
| }); | ||
| }; |
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🌐 Web query:
Does forcing a Connection: close header on every fetch request in Node.js (via cross-fetch / node-fetch) cause performance regressions or socket errors with HTTP keep-alive agents?
💡 Result:
Forcing a Connection: close header on every fetch request in Node.js can cause performance regressions and, in certain environments, trigger socket errors or connectivity issues [1][2]. The impact of this practice is primarily negative due to the following reasons: 1. Performance Regression: HTTP keep-alive is designed to improve performance by reusing existing TCP connections for multiple requests [3][4]. By forcing Connection: close, you instruct the client to terminate the TCP connection after each request [3]. This forces the client to perform a full TCP (and potentially TLS) handshake for every single request, significantly increasing latency and overhead, especially for applications making frequent calls to the same host [4]. 2. Socket Errors and Compatibility: Modern Node.js versions and HTTP agents are increasingly optimized for keep-alive by default [2][5]. Forcing Connection: close can conflict with the internal management logic of the Node.js http.Agent [1][2]. Specifically, sending a Connection: close header on a connection that the underlying agent intends to keep open can lead to protocol mismatches, resulting in socket hang-ups or ECONNRESET errors [2]. 3. Library-Specific Context: In older versions of libraries like node-fetch, the library automatically appended a Connection: close header when no agent was provided [6]. This behavior was later identified as a source of performance degradation and instability in newer Node.js releases, leading to its removal in subsequent updates [1][2]. Modern implementations rely on the Node.js http.Agent to manage connection lifecycles appropriately [1][7]. To manage connection behavior correctly, you should: - Use an http.Agent with the keepAlive: true option enabled if you wish to reuse connections [5][8]. - Avoid manually setting the Connection header; allow the Node.js Agent to manage the header based on the agent's configuration and the specific request/response lifecycle [3][1]. Top results: [3][1][2][5]
Citations:
- 1: fix: Remove the default connection close header. node-fetch/node-fetch#1736
- 2: "socket hang up" / ECONNRESET on consecutive requests with Node.js 19 and Node.js 20 node-fetch/node-fetch#1735
- 3: https://www.thenodebook.com/http/keepalive-agents-pools
- 4: https://connectreport.com/blog/tuning-http-keep-alive-in-node-js/
- 5: https://github.com/node-fetch/node-fetch/
- 6: https://github.com/node-fetch/node-fetch/blob/37ac459cfd0eafdf5bbb3d083aa82f0f2a3c9b75/README.md
- 7: fix: Remove the default connection close header (v2 branch) node-fetch/node-fetch#1765
- 8: https://stackoverflow.com/questions/62500011/reuse-tcp-connection-with-node-fetch-in-node-js
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== gitea.ts ==\n'
sed -n '1,220p' packages/backend/src/gitea.ts
printf '\n== package references ==\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' \
'cross-fetch|node-fetch|keepAlive|Connection: close|Accept-Encoding|fetch\(' \
packages/backend package.json pnpm-lock.yaml yarn.lock package-lock.json 2>/dev/null || true
printf '\n== file list ==\n'
git ls-files 'packages/backend/*' | sed -n '1,120p'Repository: sourcebot-dev/sourcebot
Length of output: 10256
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,220p' packages/backend/src/gitea.ts
printf '\n---\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' \
'cross-fetch|node-fetch|keepAlive|Connection: close|Accept-Encoding|fetch\(' \
packages/backend package.json pnpm-lock.yaml yarn.lock package-lock.json 2>/dev/null || true
printf '\n---\n'
git ls-files 'packages/backend/*' | sed -n '1,120p'Repository: sourcebot-dev/sourcebot
Length of output: 10207
🌐 Web query:
cross-fetch Node.js backend uses node-fetch and how it handles the Connection header and keep-alive agent
💡 Result:
Cross-fetch does not implement its own logic for connection handling or HTTP agents; instead, it acts as a proxy [1][2][3]. When running in a Node.js environment, cross-fetch imports and exports node-fetch [1][3]. Consequently, its behavior regarding the Connection header and keep-alive agents is identical to that of node-fetch [1][4]. Regarding connection handling in node-fetch: 1. Connection Header: Historically, node-fetch included a default Connection: close header, which could interfere with connection pooling and keep-alive functionality [5][6]. This default behavior was removed in later versions to allow the underlying Node.js http/https implementation to manage the Connection header according to standard defaults (which typically support keep-alive when agents are used) [6]. 2. Keep-Alive Agents: Node-fetch supports the use of custom http(s).Agent instances via the agent option in the fetch request configuration [4]. When an agent is provided, node-fetch delegates the responsibility of connection management—including keep-alive settings—to that agent [4][6]. If no agent is explicitly provided, node-fetch relies on the default Node.js global agent [4]. Since Node.js 19, keepAlive is true by default in the global agent, but in earlier versions, users needed to manually configure and pass an agent with keepAlive: true to utilize connection pooling effectively [4]. In summary, to enable keep-alive when using cross-fetch in a Node.js backend, you should follow standard node-fetch practices: pass an http(s).Agent configured with { keepAlive: true } in the options object of your fetch call [4][5].
Citations:
- 1: https://github.com/lquixada/cross-fetch/
- 2: https://www.npmjs.com/package/cross-fetch
- 3: https://npm-compare.com/cross-fetch,isomorphic-fetch,node-fetch
- 4: https://github.com/node-fetch/node-fetch/blob/main/README.md
- 5: keep-alive connections and http / https agents node-fetch/node-fetch#110
- 6: fix: Remove the default connection close header (v2 branch) node-fetch/node-fetch#1765
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== root package.json ==\n'
sed -n '1,220p' package.json
printf '\n== backend package.json ==\n'
sed -n '1,220p' packages/backend/package.json
printf '\n== cross-fetch/node-fetch package metadata in lockfile ==\n'
awk '
$0 ~ /^"cross-fetch@npm:\^4\.0\.0":$/ {show=1}
show {print}
show && NF==0 {exit}
' yarn.lock
printf '\n== node-fetch 2.7.0 metadata in lockfile ==\n'
awk '
$0 ~ /^"node-fetch@npm:\^2\.6\.7, node-fetch@npm:\^2\.7\.0":$/ {show=1}
show {print}
show && NF==0 {exit}
' yarn.lockRepository: sourcebot-dev/sourcebot
Length of output: 6159
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== paginate usage in gitea.ts ==\n'
rg -n 'paginate\(' packages/backend/src/gitea.ts
printf '\n== connectionUtils.ts ==\n'
sed -n '1,220p' packages/backend/src/connectionUtils.tsRepository: sourcebot-dev/sourcebot
Length of output: 1447
Avoid Connection: close on every Gitea request. This path paginates org/repo/user listings, so a sync can make many requests; forcing a fresh connection each time drops keep-alive reuse and adds avoidable TCP/TLS overhead. Gate the workaround to affected hosts or use a keep-alive agent instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backend/src/gitea.ts` around lines 15 - 29, The custom fetch wrapper
in gitea.ts is forcing Connection: close for every request, which breaks
keep-alive reuse during paginated org/repo/user syncs. Update customFetch to
apply this workaround only for affected Gitea hosts or error-prone cases, or
switch to a keep-alive agent while still preserving the Accept-Encoding identity
workaround. Keep the change localized to customFetch so the rest of the Gitea
request flow continues using normal connection reuse.
| if (response.error || !response.data) { | ||
| throw response.error ?? new Error(`Received empty response body while fetching repository ${repo}`); | ||
| } | ||
|
|
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect getReposForOrgs/getReposOwnedByUsers for equivalent response validation
fd gitea.ts packages/backend/src --exec sed -n '96,219p' {}Repository: sourcebot-dev/sourcebot
Length of output: 3484
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate paginate and the repo listing callers/validators
rg -n "const paginate|function paginate|paginate\(" packages/backend/src -g '*.ts' -g '*.tsx'
# Show the relevant sections around paginate and repoGet in gitea.ts
file="packages/backend/src/gitea.ts"
start=$(rg -n "const paginate|function paginate" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
sed -n "$((start-20)),$((start+120))p" "$file"
fi
echo "---- repoGet section ----"
rg -n "repoGet\(|Received empty response body|response\.error|!response\.data" "$file" -n -A4 -B4
echo "---- repoCompileUtils non-null assertions ----"
rg -n "full_name!|clone_url!|id!" packages/backend/src/repoCompileUtils.ts -n -A3 -B3Repository: sourcebot-dev/sourcebot
Length of output: 5303
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect measure() and any existing handling of HttpResponse errors/empty payloads
file="packages/backend/src/gitea.ts"
rg -n "const measure|function measure|measure\(" packages/backend/src -g '*.ts'
echo "---- measure and surrounding helpers ----"
start=$(rg -n "const measure|function measure" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
sed -n "$((start-40)),$((start+80))p" "$file"
fi
echo "---- HttpResponse usage in gitea.ts ----"
rg -n "HttpResponse|response\.error|response\.data|data: response" "$file" -n -A3 -B3
echo "---- API client type definitions if present ----"
fd -a "types.ts" packages/backend/src node_modules 2>/dev/null | head -n 20Repository: sourcebot-dev/sourcebot
Length of output: 2613
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read measure() and the gitea pagination helper
sed -n '1,80p' packages/backend/src/utils.ts
echo "----"
sed -n '120,320p' packages/backend/src/gitea.ts
# Find where HttpResponse comes from if it's referenced in the repo
rg -n "interface HttpResponse|type HttpResponse|HttpResponse<" packages/backend/src -g '*.ts'Repository: sourcebot-dev/sourcebot
Length of output: 7603
Guard the paginated org/user repo-listing responses
packages/backend/src/gitea.ts:145,185 getReposOwnedByUsers() / getReposForOrgs() still rely on paginate() without the same empty/error-response guard, so a premature-close response can surface as an opaque runtime failure in these sync paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backend/src/gitea.ts` around lines 230 - 233, The paginated repo
listing paths in getReposOwnedByUsers() and getReposForOrgs() are missing the
same empty/error-response guard used elsewhere, so a premature-close from
paginate() can bubble up as an opaque failure. Update the pagination handling in
these functions in gitea.ts to explicitly check for response.error or missing
response.data after each page fetch, and throw a clear error with the relevant
repo/org context before mapping or accumulating results.
Fixes SOU-1484
Fixes #1404
Syncing repos from some self-hosted Gitea instances fails when
cross-fetchcan't read the API response body (ERR_STREAM_PREMATURE_CLOSE).repoGet()then returns{ data: null, error: {...} }, the null gets pushed into the repo list, and the sync crashes onrepo.full_name.Changes:
Accept-Encoding: identityandConnection: close, which avoids the premature close (root cause).repoGetresponse so a failed fetch surfaces as a warning/error instead of a silent null.Summary by CodeRabbit
Bug Fixes
Documentation