Skip to content

fix(git): normalize git_log schema across filtered and unfiltered branches#4470

Open
Sagargupta16 wants to merge 1 commit into
modelcontextprotocol:mainfrom
Sagargupta16:fix/git-log-schema
Open

fix(git): normalize git_log schema across filtered and unfiltered branches#4470
Sagargupta16 wants to merge 1 commit into
modelcontextprotocol:mainfrom
Sagargupta16:fix/git-log-schema

Conversation

@Sagargupta16

Copy link
Copy Markdown

Summary

Closes #4469.

`git_log` returned two different string shapes depending on whether `start_timestamp`/`end_timestamp` was passed:

Before (unfiltered branch — server.py:187-197)

Used `commit.hexsha!r` / `commit.author!r`, which produced repr-quoted values:

```
Commit: 'a1b2c3d4e5f6...'
Author: <git.Actor "Name ">
Date: 2026-07-01 12:00:00+00:00
Message: 'subject\n\nbody\n'
```

Before (filtered branch — server.py:180-185)

Used git log --format=%H%n%an%n%ad%n%s%n and split, producing raw values but losing the commit body (`%s` is subject-only):

```
Commit: a1b2c3d4e5f6...
Author: Name
Date: Wed Jul 1 12:00:00 2026 +0000
Message: subject
```

Downstream parsers that split on `\n` and `:` had to special-case which branch produced the entry.

Fix

Collapse both branches onto a single `repo.iter_commits(max_count=..., since=..., until=...)` call, so there is exactly one code path — schema drift is impossible by construction. Also drop the `!r` formatting so both filtered and unfiltered entries emit raw values (bare commit hash, plain author `Name `, full `commit.message` including body).

The flag-injection defense (`start_timestamp.startswith("-")`) is preserved and now applies uniformly whether or not the fast path is taken.

Tests

Added `test_git_log_schema_matches_across_filter_branches` that asserts:

  • Same key order (`Commit`, `Author`, `Date`, `Message`) across both filtered and unfiltered calls
  • Neither branch emits repr-quoted commit hashes (guards against the leading-`'` regression)

All existing `git_log` tests pass; the unrelated `test_validate_repo_path_symlink_escape` failure on Windows is pre-existing on upstream (verified via stash).

Compatibility note

The Date: line for the unfiltered branch now uses commit.authored_datetime (already the case before), and the filtered branch's format string is gone. If any external consumer was pattern-matching on the raw git-format date (`Wed Jul 1 12:00:00 2026 +0000`), they now get the ISO-style datetime for both branches, which is the consistent shape #4469 asks for.

…nches

git_log emitted two different string shapes depending on whether
start_timestamp or end_timestamp was passed:

  Unfiltered branch used commit.hexsha!r / commit.author!r, producing
  repr()-quoted values with leading single quotes and Actor angle
  brackets, and included the full commit body via commit.message.

  Filtered branch used git log --format=%H%n%an%n%ad%n%s%n, producing
  raw unquoted values, and dropped the commit body entirely (%s is
  subject only).

Downstream parsers that split on "\n" and ":" have to special-case
which branch produced the entry. Reported in modelcontextprotocol#4469.

Collapse both branches onto repo.iter_commits() with since/until
kwargs, so there is exactly one code path. Drop the repr() formatting
so both branches emit raw values (bare commit hash, author name/email,
full message).

Add a regression test asserting the two branches produce the same
key order and neither emits repr artefacts.

Closes modelcontextprotocol#4469
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

git: git_log output schema differs between filtered vs unfiltered branches

1 participant