Skip to content

fix(backend): handle Gitea ERR_STREAM_PREMATURE_CLOSE during sync#1405

Open
brendan-kellam wants to merge 3 commits into
mainfrom
brendan/sou-1484-bug-gitea-sync-fails-with-err_stream_premature_close-75d2
Open

fix(backend): handle Gitea ERR_STREAM_PREMATURE_CLOSE during sync#1405
brendan-kellam wants to merge 3 commits into
mainfrom
brendan/sou-1484-bug-gitea-sync-fails-with-err_stream_premature_close-75d2

Conversation

@brendan-kellam

@brendan-kellam brendan-kellam commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Fixes SOU-1484
Fixes #1404

Syncing repos from some self-hosted Gitea instances fails when cross-fetch can't read the API response body (ERR_STREAM_PREMATURE_CLOSE). repoGet() then returns { data: null, error: {...} }, the null gets pushed into the repo list, and the sync crashes on repo.full_name.

Changes:

  • Wrap the Gitea fetch to force Accept-Encoding: identity and Connection: close, which avoids the premature close (root cause).
  • Throw on an errored/empty repoGet response so a failed fetch surfaces as a warning/error instead of a silent null.
  • Guard the repo filter against null/undefined entries.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed Gitea sync failures caused by interrupted response streams.
    • Improved repository syncing to skip invalid or empty repository entries instead of failing.
    • Added safer handling when fetching repository details, reducing errors from missing responses.
  • Documentation

    • Updated the changelog with the sync-failure fix.

Force identity encoding / connection close in the Gitea API fetch to avoid
cross-fetch failing to read the response body, and guard against null repo
data so a failed fetch no longer crashes on repo.full_name.

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR fixes a Gitea sync failure caused by ERR_STREAM_PREMATURE_CLOSE. A custom fetch wrapper forces identity encoding and connection close headers. Repository fetch responses are now validated for errors/missing data, and null/undefined repository entries are filtered out with a warning log. A changelog entry documents the fix.

Changes

Gitea sync reliability fix

Layer / File(s) Summary
Custom fetch wrapper and client wiring
packages/backend/src/gitea.ts
Introduces a customFetch wrapper around cross-fetch that sets Accept-Encoding: identity and Connection: close headers, and wires it into giteaApi configuration replacing the raw fetch reference.
Response validation and null repo filtering
packages/backend/src/gitea.ts, CHANGELOG.md
Throws when HttpResponse from repoGet has an error or missing data; filters out null/undefined repos during aggregation with a warning log; documents the fix in the changelog.

Sequence Diagram(s)

sequenceDiagram
  participant SyncJob
  participant giteaApi
  participant customFetch
  participant GiteaServer

  SyncJob->>giteaApi: repoGet(org, repo)
  giteaApi->>customFetch: fetch request
  customFetch->>GiteaServer: HTTP request (Accept-Encoding identity, Connection close)
  GiteaServer-->>customFetch: response body
  customFetch-->>giteaApi: response
  giteaApi-->>SyncJob: HttpResponse(data, error)
  alt error present or data missing
    SyncJob->>SyncJob: throw error
  else valid data
    SyncJob->>SyncJob: filter null/undefined repos, log warning
    SyncJob->>SyncJob: continue aggregation
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main fix: handling Gitea premature-close sync failures in the backend.
Linked Issues check ✅ Passed The changes address the linked bug by hardening fetch, rejecting null repoGet responses, and guarding null repo entries.
Out of Scope Changes check ✅ Passed The PR stays focused on the Gitea sync failure and adds only a changelog note alongside the code fix.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch brendan/sou-1484-bug-gitea-sync-fails-with-err_stream_premature_close-75d2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Co-authored-by: linear-code[bot] <222613912+linear-code[bot]@users.noreply.github.com>
@brendan-kellam brendan-kellam marked this pull request as ready for review June 30, 2026 23:02

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/backend/src/gitea.ts`:
- Around line 230-233: The paginated repo listing paths in
getReposOwnedByUsers() and getReposForOrgs() are missing the same
empty/error-response guard used elsewhere, so a premature-close from paginate()
can bubble up as an opaque failure. Update the pagination handling in these
functions in gitea.ts to explicitly check for response.error or missing
response.data after each page fetch, and throw a clear error with the relevant
repo/org context before mapping or accumulating results.
- Around line 15-29: The custom fetch wrapper in gitea.ts is forcing Connection:
close for every request, which breaks keep-alive reuse during paginated
org/repo/user syncs. Update customFetch to apply this workaround only for
affected Gitea hosts or error-prone cases, or switch to a keep-alive agent while
still preserving the Accept-Encoding identity workaround. Keep the change
localized to customFetch so the rest of the Gitea request flow continues using
normal connection reuse.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 128bae05-8209-4db2-b9bc-93dcd60bd6ba

📥 Commits

Reviewing files that changed from the base of the PR and between 216c7d8 and 474b7be.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • packages/backend/src/gitea.ts

Comment on lines +15 to +29
// Some Gitea instances (particularly when behind certain reverse proxies or with
// response compression enabled) cause `cross-fetch` to fail while reading the
// response body with ERR_STREAM_PREMATURE_CLOSE. Forcing identity encoding and
// closing the connection avoids the premature close.
// @see https://github.com/sourcebot-dev/sourcebot/issues/1404
const customFetch: typeof fetch = (url, options = {}) => {
return fetch(url, {
...options,
headers: {
...(options.headers ?? {}),
'Accept-Encoding': 'identity',
'Connection': 'close',
},
});
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does forcing a Connection: close header on every fetch request in Node.js (via cross-fetch / node-fetch) cause performance regressions or socket errors with HTTP keep-alive agents?

💡 Result:

Forcing a Connection: close header on every fetch request in Node.js can cause performance regressions and, in certain environments, trigger socket errors or connectivity issues [1][2]. The impact of this practice is primarily negative due to the following reasons: 1. Performance Regression: HTTP keep-alive is designed to improve performance by reusing existing TCP connections for multiple requests [3][4]. By forcing Connection: close, you instruct the client to terminate the TCP connection after each request [3]. This forces the client to perform a full TCP (and potentially TLS) handshake for every single request, significantly increasing latency and overhead, especially for applications making frequent calls to the same host [4]. 2. Socket Errors and Compatibility: Modern Node.js versions and HTTP agents are increasingly optimized for keep-alive by default [2][5]. Forcing Connection: close can conflict with the internal management logic of the Node.js http.Agent [1][2]. Specifically, sending a Connection: close header on a connection that the underlying agent intends to keep open can lead to protocol mismatches, resulting in socket hang-ups or ECONNRESET errors [2]. 3. Library-Specific Context: In older versions of libraries like node-fetch, the library automatically appended a Connection: close header when no agent was provided [6]. This behavior was later identified as a source of performance degradation and instability in newer Node.js releases, leading to its removal in subsequent updates [1][2]. Modern implementations rely on the Node.js http.Agent to manage connection lifecycles appropriately [1][7]. To manage connection behavior correctly, you should: - Use an http.Agent with the keepAlive: true option enabled if you wish to reuse connections [5][8]. - Avoid manually setting the Connection header; allow the Node.js Agent to manage the header based on the agent's configuration and the specific request/response lifecycle [3][1]. Top results: [3][1][2][5]

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== gitea.ts ==\n'
sed -n '1,220p' packages/backend/src/gitea.ts

printf '\n== package references ==\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' \
  'cross-fetch|node-fetch|keepAlive|Connection: close|Accept-Encoding|fetch\(' \
  packages/backend package.json pnpm-lock.yaml yarn.lock package-lock.json 2>/dev/null || true

printf '\n== file list ==\n'
git ls-files 'packages/backend/*' | sed -n '1,120p'

Repository: sourcebot-dev/sourcebot

Length of output: 10256


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,220p' packages/backend/src/gitea.ts
printf '\n---\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' \
  'cross-fetch|node-fetch|keepAlive|Connection: close|Accept-Encoding|fetch\(' \
  packages/backend package.json pnpm-lock.yaml yarn.lock package-lock.json 2>/dev/null || true
printf '\n---\n'
git ls-files 'packages/backend/*' | sed -n '1,120p'

Repository: sourcebot-dev/sourcebot

Length of output: 10207


🌐 Web query:

cross-fetch Node.js backend uses node-fetch and how it handles the Connection header and keep-alive agent

💡 Result:

Cross-fetch does not implement its own logic for connection handling or HTTP agents; instead, it acts as a proxy [1][2][3]. When running in a Node.js environment, cross-fetch imports and exports node-fetch [1][3]. Consequently, its behavior regarding the Connection header and keep-alive agents is identical to that of node-fetch [1][4]. Regarding connection handling in node-fetch: 1. Connection Header: Historically, node-fetch included a default Connection: close header, which could interfere with connection pooling and keep-alive functionality [5][6]. This default behavior was removed in later versions to allow the underlying Node.js http/https implementation to manage the Connection header according to standard defaults (which typically support keep-alive when agents are used) [6]. 2. Keep-Alive Agents: Node-fetch supports the use of custom http(s).Agent instances via the agent option in the fetch request configuration [4]. When an agent is provided, node-fetch delegates the responsibility of connection management—including keep-alive settings—to that agent [4][6]. If no agent is explicitly provided, node-fetch relies on the default Node.js global agent [4]. Since Node.js 19, keepAlive is true by default in the global agent, but in earlier versions, users needed to manually configure and pass an agent with keepAlive: true to utilize connection pooling effectively [4]. In summary, to enable keep-alive when using cross-fetch in a Node.js backend, you should follow standard node-fetch practices: pass an http(s).Agent configured with { keepAlive: true } in the options object of your fetch call [4][5].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== root package.json ==\n'
sed -n '1,220p' package.json

printf '\n== backend package.json ==\n'
sed -n '1,220p' packages/backend/package.json

printf '\n== cross-fetch/node-fetch package metadata in lockfile ==\n'
awk '
  $0 ~ /^"cross-fetch@npm:\^4\.0\.0":$/ {show=1}
  show {print}
  show && NF==0 {exit}
' yarn.lock

printf '\n== node-fetch 2.7.0 metadata in lockfile ==\n'
awk '
  $0 ~ /^"node-fetch@npm:\^2\.6\.7, node-fetch@npm:\^2\.7\.0":$/ {show=1}
  show {print}
  show && NF==0 {exit}
' yarn.lock

Repository: sourcebot-dev/sourcebot

Length of output: 6159


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== paginate usage in gitea.ts ==\n'
rg -n 'paginate\(' packages/backend/src/gitea.ts

printf '\n== connectionUtils.ts ==\n'
sed -n '1,220p' packages/backend/src/connectionUtils.ts

Repository: sourcebot-dev/sourcebot

Length of output: 1447


Avoid Connection: close on every Gitea request. This path paginates org/repo/user listings, so a sync can make many requests; forcing a fresh connection each time drops keep-alive reuse and adds avoidable TCP/TLS overhead. Gate the workaround to affected hosts or use a keep-alive agent instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/gitea.ts` around lines 15 - 29, The custom fetch wrapper
in gitea.ts is forcing Connection: close for every request, which breaks
keep-alive reuse during paginated org/repo/user syncs. Update customFetch to
apply this workaround only for affected Gitea hosts or error-prone cases, or
switch to a keep-alive agent while still preserving the Accept-Encoding identity
workaround. Keep the change localized to customFetch so the rest of the Gitea
request flow continues using normal connection reuse.

Comment on lines +230 to +233
if (response.error || !response.data) {
throw response.error ?? new Error(`Received empty response body while fetching repository ${repo}`);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect getReposForOrgs/getReposOwnedByUsers for equivalent response validation
fd gitea.ts packages/backend/src --exec sed -n '96,219p' {}

Repository: sourcebot-dev/sourcebot

Length of output: 3484


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate paginate and the repo listing callers/validators
rg -n "const paginate|function paginate|paginate\(" packages/backend/src -g '*.ts' -g '*.tsx'

# Show the relevant sections around paginate and repoGet in gitea.ts
file="packages/backend/src/gitea.ts"
start=$(rg -n "const paginate|function paginate" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
  sed -n "$((start-20)),$((start+120))p" "$file"
fi

echo "---- repoGet section ----"
rg -n "repoGet\(|Received empty response body|response\.error|!response\.data" "$file" -n -A4 -B4

echo "---- repoCompileUtils non-null assertions ----"
rg -n "full_name!|clone_url!|id!" packages/backend/src/repoCompileUtils.ts -n -A3 -B3

Repository: sourcebot-dev/sourcebot

Length of output: 5303


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect measure() and any existing handling of HttpResponse errors/empty payloads
file="packages/backend/src/gitea.ts"

rg -n "const measure|function measure|measure\(" packages/backend/src -g '*.ts'

echo "---- measure and surrounding helpers ----"
start=$(rg -n "const measure|function measure" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
  sed -n "$((start-40)),$((start+80))p" "$file"
fi

echo "---- HttpResponse usage in gitea.ts ----"
rg -n "HttpResponse|response\.error|response\.data|data: response" "$file" -n -A3 -B3

echo "---- API client type definitions if present ----"
fd -a "types.ts" packages/backend/src node_modules 2>/dev/null | head -n 20

Repository: sourcebot-dev/sourcebot

Length of output: 2613


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read measure() and the gitea pagination helper
sed -n '1,80p' packages/backend/src/utils.ts
echo "----"
sed -n '120,320p' packages/backend/src/gitea.ts

# Find where HttpResponse comes from if it's referenced in the repo
rg -n "interface HttpResponse|type HttpResponse|HttpResponse<" packages/backend/src -g '*.ts'

Repository: sourcebot-dev/sourcebot

Length of output: 7603


Guard the paginated org/user repo-listing responses
packages/backend/src/gitea.ts:145,185 getReposOwnedByUsers() / getReposForOrgs() still rely on paginate() without the same empty/error-response guard, so a premature-close response can surface as an opaque runtime failure in these sync paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/gitea.ts` around lines 230 - 233, The paginated repo
listing paths in getReposOwnedByUsers() and getReposForOrgs() are missing the
same empty/error-response guard used elsewhere, so a premature-close from
paginate() can bubble up as an opaque failure. Update the pagination handling in
these functions in gitea.ts to explicitly check for response.error or missing
response.data after each page fetch, and throw a clear error with the relevant
repo/org context before mapping or accumulating results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Gitea sync fails with ERR_STREAM_PREMATURE_CLOSE, then crashes on null full_name

1 participant