Rebase shears/next: 1 conflict(s) (0 skipped, 1 resolved) (#28052942691)#259
Open
gitforwindowshelper[bot] wants to merge 317 commits into
Open
Rebase shears/next: 1 conflict(s) (0 skipped, 1 resolved) (#28052942691)#259gitforwindowshelper[bot] wants to merge 317 commits into
gitforwindowshelper[bot] wants to merge 317 commits into
Conversation
When 'git survey' provides information to the user, this will be presented in one of two formats: plaintext and JSON. The JSON implementation will be delayed until the functionality is complete for the plaintext format. The most important parts of the plaintext format are headers specifying the different sections of the report and tables providing concreted data. Create a custom table data structure that allows specifying a list of strings for the row values. When printing the table, check each column for the maximum width so we can create a table of the correct size from the start. The table structure is designed to be flexible to the different kinds of output that will be implemented in future changes. Signed-off-by: Derrick Stolee <stolee@gmail.com>
When building with `make MSVC=1 DEBUG=1`, link to `libexpatd.lib` rather than `libexpat.lib`. It appears that the `vcpkg` package for "libexpat" has changed and now creates `libexpatd.lib` for debug mode builds. Previously, both debug and release builds created a ".lib" with the same basename. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
At the moment, nothing is obvious about the reason for the use of the
path-walk API, but this will become more prevelant in future iterations. For
now, use the path-walk API to sum up the counts of each kind of object.
For example, this is the reachable object summary output for my local repo:
REACHABLE OBJECT SUMMARY
========================
Object Type | Count
------------+-------
Tags | 1343
Commits | 179344
Trees | 314350
Blobs | 184030
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Now that we have explored objects by count, we can expand that a bit more to summarize the data for the on-disk and inflated size of those objects. This information is helpful for diagnosing both why disk space (and perhaps clone or fetch times) is growing but also why certain operations are slow because the inflated size of the abstract objects that must be processed is so large. Note: zlib-ng is slightly more efficient even at those small sizes. Even between zlib versions, there are slight differences in compression. To accommodate for that in the tests, not the exact numbers but some rough approximations are validated (the test should validate `git survey`, after all, not zlib). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Create a wrapper for the Windows Resource Compiler (RC.EXE) for use by the MSVC=1 builds. This is similar to the CL.EXE and LIB.EXE wrappers used for the MSVC=1 builds. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
The winsock2 library provides functions that work on different data types than file descriptors, therefore we wrap them. But that is not the only difference: they also do not set `errno` but expect the callers to enquire about errors via `WSAGetLastError()`. Let's translate that into appropriate `errno` values whenever the socket operations fail so that Git's code base does not have to change its expectations. This closes git-for-windows#2404 Helped-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach MSVC=1 builds to depend on the `git.rc` file so that the resulting executables have Windows-style resources and version number information within them. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
This compile-time option allows to ask Git to load libcurl dynamically at runtime. Together with a follow-up patch that optionally overrides the file name depending on the `http.sslBackend` setting, this kicks open the door for installing multiple libcurl flavors side by side, and load the one corresponding to the (runtime-)configured SSL/TLS backend. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In future changes, we will make use of these methods. The intention is to keep track of the top contributors according to some metric. We don't want to store all of the entries and do a sort at the end, so track a constant-size table and remove rows that get pushed out depending on the chosen sorting algorithm. Co-authored-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by; Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>
We map WSAGetLastError() errors to errno errors in winsock_error_to_errno(), but the MSVC strerror() implementation only produces "Unknown error" for most of them. Produce some more meaningful error messages in these cases. Our builds for ARM64 link against the newer UCRT strerror() that does know these errors, so we won't change the strerror() used there. The wording of the messages is copied from glibc strerror() messages. Reported-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Although NTLM authentication is considered weak (extending even to
NTLMv2, which purportedly allows brute-forcing reasonably complex
8-character passwords in a matter of days, given ample compute
resources), it _is_ one of the authentication methods supported by
libcurl.
Note: The added test case *cannot* reuse the existing `custom_auth`
facility. The reason is that that facility is backed by an NPH script
("No Parse Headers"), which does not allow handling the 3-phase NTLM
authentication correctly (in my hands, the NPH script would not even be
called upon the Type 3 message, a "200 OK" would be returned, but no
headers, let alone the `git http-backend` output as payload). Having a
separate NTLM authentication script makes the exact workings clearer and
more readable, anyway.
Co-authored-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A change between versions 2.4.1 and 2.6.0 of the MSYS2 runtime modified how Cygwin's runtime (and hence Git for Windows' MSYS2 runtime derivative) handles locales: d16a56306d (Consolidate wctomb/mbtowc calls for POSIX-1.2008, 2016-07-20). An unintended side-effect is that "cold-calling" into the POSIX emulation will start with a locale based on the current code page, something that Git for Windows is very ill-prepared for, as it expects to be able to pass a command-line containing non-ASCII characters to the shell without having those characters munged. One symptom of this behavior: when `git clone` or `git fetch` shell out to call `git-upload-pack` with a path that contains non-ASCII characters, the shell tried to interpret the entire command-line (including command-line parameters) as executable path, which obviously must fail. This fixes git-for-windows#1036 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Ignore the `-fno-stack-protector` compiler argument when building with MSVC. This will be used in a later commit that needs to build a Win32 GUI app. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
This implements the Windows-specific support code, because everything is slightly different on Windows, even loading shared libraries. Note: I specifically do _not_ use the code from `compat/win32/lazyload.h` here because that code is optimized for loading individual functions from various system DLLs, while we specifically want to load _many_ functions from _one_ DLL here, and distinctly not a system DLL (we expect libcurl to be located outside `C:\Windows\system32`, something `INIT_PROC_ADDR` refuses to work with). Also, the `curl_easy_getinfo()`/`curl_easy_setopt()` functions are declared as vararg functions, which `lazyload.h` cannot handle. Finally, we are about to optionally override the exact file name that is to be loaded, which is a goal contrary to `lazyload.h`'s design. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since we are already walking our reachable objects using the path-walk API,
let's now collect lists of the paths that contribute most to different
metrics. Specifically, we care about
* Number of versions.
* Total size on disk.
* Total inflated size (no delta or zlib compression).
This information can be critical to discovering which parts of the
repository are causing the most growth, especially on-disk size. Different
packing strategies might help compress data more efficiently, but the toal
inflated size is a representation of the raw size of all snapshots of those
paths. Even when stored efficiently on disk, that size represents how much
information must be processed to complete a command such as 'git blame'.
The exact disk size seems to be not quite robust enough for testing, as
could be seen by the `linux-musl-meson` job consistently failing, possibly
because of zlib-ng deflates differently: t8100.4(git survey
(default)) was failing with a symptom like this:
TOTAL OBJECT SIZES BY TYPE
===============================================
Object Type | Count | Disk Size | Inflated Size
------------+-------+-----------+--------------
- Commits | 10 | 1523 | 2153
+ Commits | 10 | 1528 | 2153
Trees | 10 | 495 | 1706
Blobs | 10 | 191 | 101
- Tags | 4 | 510 | 528
+ Tags | 4 | 547 | 528
This means: the disk size is unlikely something we can verify robustly.
Since zlib-ng seems to increase the disk size of the tags from 528 to
547, we cannot even assume that the disk size is always smaller than the
inflated size. We will most likely want to either skip verifying the
disk size altogether, or go for some kind of fuzzy matching, say, by
replacing `s/ 1[45][0-9][0-9] / ~1.5k /` and `s/ [45][0-9][0-9] / ~½k /`
or something like that.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This comment has been true for the longest time; The combination of the two preceding commits made it incorrect, so let's drop that comment. Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
NTLM authentication is relatively weak. This is the case even with the
default setting of modern Windows versions, where NTLMv1 and LanManager
are disabled and only NTLMv2 is enabled: NTLMv2 hashes of even
reasonably complex 8-character passwords can be broken in a matter of
days, given enough compute resources.
Even worse: On Windows, NTLM authentication uses Security Support
Provider Interface ("SSPI"), which provides the credentials without
requiring the user to type them in.
Which means that an attacker could talk an unsuspecting user into
cloning from a server that is under the attacker's control and extracts
the user's NTLMv2 hash without their knowledge.
For that reason, let's disallow NTLM authentication by default.
NTLM authentication is quite simple to set up, though, and therefore
there are still some on-prem Azure DevOps setups out there whose users
and/or automation rely on this type of authentication. To give them an
escape hatch, introduce the `http.<url>.allowNTLMAuth` config setting
that can be set to `true` to opt back into using NTLM for a specific
remote repository.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows wants to add `git.exe` to the users' `PATH`, without cluttering the latter with unnecessary executables such as `wish.exe`. To that end, it invented the concept of its "Git wrapper", i.e. a tiny executable located in `C:\Program Files\Git\cmd\git.exe` (originally a CMD script) whose sole purpose is to set up a couple of environment variables and then spawn the _actual_ `git.exe` (which nowadays lives in `C:\Program Files\Git\mingw64\bin\git.exe` for 64-bit, and the obvious equivalent for 32-bit installations). Currently, the following environment variables are set unless already initialized: - `MSYSTEM`, to make sure that the MSYS2 Bash and the MSYS2 Perl interpreter behave as expected, and - `PLINK_PROTOCOL`, to force PuTTY's `plink.exe` to use the SSH protocol instead of Telnet, - `PATH`, to make sure that the `bin` folder in the user's home directory, as well as the `/mingw64/bin` and the `/usr/bin` directories are included. The trick here is that the `/mingw64/bin/` and `/usr/bin/` directories are relative to the top-level installation directory of Git for Windows (which the included Bash interprets as `/`, i.e. as the MSYS pseudo root directory). Using the absence of `MSYSTEM` as a tell-tale, we can detect in `git.exe` whether these environment variables have been initialized properly. Therefore we can call `C:\Program Files\Git\mingw64\bin\git` in-place after this change, without having to call Git through the Git wrapper. Obviously, above-mentioned directories must be _prepended_ to the `PATH` variable, otherwise we risk picking up executables from unrelated Git installations. We do that by constructing the new `PATH` value from scratch, appending `$HOME/bin` (if `HOME` is set), then the MSYS2 system directories, and then appending the original `PATH`. Side note: this modification of the `PATH` variable is independent of the modification necessary to reach the executables and scripts in `/mingw64/libexec/git-core/`, i.e. the `GIT_EXEC_PATH`. That modification is still performed by Git, elsewhere, long after making the changes described above. While we _still_ cannot simply hard-link `mingw64\bin\git.exe` to `cmd` (because the former depends on a couple of `.dll` files that are only in `mingw64\bin`, i.e. calling `...\cmd\git.exe` would fail to load due to missing dependencies), at least we can now avoid that extra process of running the Git wrapper (which then has to wait for the spawned `git.exe` to finish) by calling `...\mingw64\bin\git.exe` directly, via its absolute path. Testing this is in Git's test suite tricky: we set up a "new" MSYS pseudo-root and copy the `git.exe` file into the appropriate location, then verify that `MSYSTEM` is set properly, and also that the `PATH` is modified so that scripts can be found in `$HOME/bin`, `/mingw64/bin/` and `/usr/bin/`. This addresses git-for-windows#2283 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Move the default `-ENTRY` and `-SUBSYSTEM` arguments for MSVC=1 builds from `config.mak.uname` into `clink.pl`. These args are constant for console-mode executables. Add support to `clink.pl` for generating a Win32 GUI application using the `-mwindows` argument (to match how GCC does it). This changes the `-ENTRY` and `-SUBSYSTEM` arguments accordingly. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
The previous commits introduced a compile-time option to load libcurl lazily, but it uses the hard-coded name "libcurl-4.dll" (or equivalent on platforms other than Windows). To allow for installing multiple libcurl flavors side by side, where each supports one specific SSL/TLS backend, let's first look whether `libcurl-<backend>-4.dll` exists, and only use `libcurl-4.dll` as a fall back. That will allow us to ship with a libcurl by default that only supports the Secure Channel backend for the `https://` protocol. This libcurl won't suffer from any dependency problem when upgrading OpenSSL to a new major version (which will change the DLL name, and hence break every program and library that depends on it). This is crucial because Git for Windows relies on libcurl to keep working when building and deploying a new OpenSSL package because that library is used by `git fetch` and `git clone`. Note that this feature is by no means specific to Windows. On Ubuntu, for example, a `git` built using `LAZY_LOAD_LIBCURL` will use `libcurl.so.4` for `http.sslbackend=openssl` and `libcurl-gnutls.so.4` for `http.sslbackend=gnutls`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The 'git survey' builtin provides several detail tables, such as "top files by on-disk size". The size of these tables defaults to 10, currently. Allow the user to specify this number via a new --top=<N> option or the new survey.top config key. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Commit 2406bf5 (Win32: detect unix socket support at runtime, 2024-04-03) introduced a runtime detection for whether the operating system supports unix sockets for Windows, but a mistake snuck into the tests. When building and testing Git without NO_UNIX_SOCKETS we currently skip t0301-credential-cache on Windows if unix sockets are supported and run the tests if they aren't. Flip that logic to actually work the way it was intended. Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The new default of Git is to disable NTLM authentication by default. To help users find the escape hatch of that config setting, should they need it, suggest it when the authentication failed and the server had offered NTLM, i.e. if re-enabling it would fix the problem. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 436a422 (max_tree_depth: lower it for clangarm64 on Windows, 2025-04-23), I provided a work-around for a nasty issue with clangarm builds, where the stack is exhausted before the maximal tree depth is reached, and the resulting error cannot easily be handled by Git (because it would require Windows-specific handling). Turns out that this is not at all limited to ARM64. In my tests with CLANG64 in MSYS2 on the GitHub Actions runners, the test t6700.4 failed in the exact same way. What's worse: The limit needs to be quite a bit lower for x86_64 than for aarch64. In aforementioned tests, the breaking point was 1232: With 1231 it still worked as expected, with 1232 it would fail with the `STATUS_STACK_OVERFLOW` incorrectly mapped to exit code 127. For comparison, in my tests on GitHub Actions' Windows/ARM64 runners, the breaking point was 1439 instead. Therefore the condition needs to be adapted once more, to accommodate (with some safety margin) both aarch64 and x86_64 in clang-based builds on Windows, to let that test pass. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…ctory Internally, Git expects the environment variable `HOME` to be set, and to point to the current user's home directory. This environment variable is not set by default on Windows, and therefore Git tries its best to construct one if it finds `HOME` unset. There are actually two different approaches Git tries: first, it looks at `HOMEDRIVE`/`HOMEPATH` because this is widely used in corporate environments with roaming profiles, and a user generally wants their global Git settings to be in a roaming profile. Only when `HOMEDRIVE`/`HOMEPATH` is either unset or does not point to a valid location, Git will fall back to using `USERPROFILE` instead. However, starting with Windows Vista, for secondary logons and services, the environment variables `HOMEDRIVE`/`HOMEPATH` point to Windows' system directory (usually `C:\Windows\system32`). That is undesirable, and that location is usually write-protected anyway. So let's verify that the `HOMEDRIVE`/`HOMEPATH` combo does not point to Windows' system directory before using it, falling back to `USERPROFILE` if it does. This fixes git-for-windows#2709 Initial-Path-by: Ivan Pozdeev <vano@mail.mipt.ru> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
headless-git is a git executable without opening a console window. It is useful when other GUI executables want to call git. We should install it together with git on Windows. Signed-off-by: Yuyi Wang <Strawberry_Str@hotmail.com>
winuser.h contains the definition of RT_MANIFEST that our LLVM based toolchain needs to understand that we want to embed compat/win32/git.manifest as an application manifest. It currently just embeds it as additional data that Windows doesn't understand. This also helps our GCC based toolchain understand that we only want one copy embedded. It currently embeds one working assembly manifest and one nearly identical, but useless copy as additional data. This also teaches our Visual Studio based buildsystems to pick up the manifest file from git.rc. This means we don't have to explicitly specify it in contrib/buildsystems/Generators/Vcxproj.pm anymore. Slightly counter-intuitively this also means we have to explicitly tell Cmake not to embed a default manifest. This fixes git-for-windows#4707 Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
This will help with Git for Windows' maintenance going forward: It allows Git for Windows to switch its primary libcurl to a variant without the OpenSSL backend, while still loading an alternate when setting `http.sslBackend = openssl`. This is necessary to avoid maintenance headaches with upgrading OpenSSL: its major version name is encoded in the shared library's file name and hence major version updates (temporarily) break libraries that are linked against the OpenSSL library. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation around large object handling: with deflate_it() and the locals around it widened, the cast_size_t_to_ulong() shim the prior delta_delta() widening had to leave behind in emit_binary_diff_body() goes away. deflate_it() is file-static; the only callers are the two in emit_binary_diff_body() already touched here. emit_diff_symbol() formats the resulting sizes via uintmax_t / %"PRIuMAX", so the diff output is not affected; only the per-process upper bound on a binary patch chunk that this function can address grows beyond 4 GiB on Windows. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for the upcoming read_blob_data_from_index() widening, whose callers in convert.c feed the size they receive straight into these two helpers. Both are file-static, so the change is contained. Also fixes a small pre-existing narrowing on the get_wt_convert_stats_ascii() path, where strbuf.len (size_t) was passed to a unsigned long parameter. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. read_blob_data_from_index() reads the blob through the size_t odb_read_object() API but writes the size back through an unsigned long out-parameter, silently truncating anything past 4 GiB on Windows. Widen the out-parameter, drop the cast_size_t_to_ulong() shim, and move the matching locals in the two convert.c callers and the one in attr.c. Their downstream consumers (gather_convert_stats() widened in the prior commit and read_attr_from_buf() already size_t) take the new type directly. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for the widenings of its callers, where size-receiving locals will become size_t (combine-diff's result_size in the immediately following commit, struct diff_filespec.size in a later topic). Body caps the parameter at 8000 anyway, so the type change is mechanical. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. With buffer_is_binary() widened in the prior commit, every consumer that the size flows into in combine-diff.c is size_t-ready, so widen grab_blob()'s out-param outright and move the matching locals at its three call sites together. grab_blob()'s body collapses to a direct odb_read_object(&size) since the bridge variable is no longer needed. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. textconv_object() fills its out-parameter from fill_textconv()'s size_t return through an unsigned long*; widen the API to match, then take advantage of the new shape where callers can. cat-file's 'c' and batch-mode 'c' branches lose their size_ul bridge variables (one site becomes a direct call, the other collapses an if/else into a single negated condition that reads as "try textconv, fall back to a raw read"). blame.c likewise drops the file_size_st bridge in fill_origin_blob() and hoists final_buf_size_st to bracket both branches in setup_scoreboard(). The latter keeps a cast_size_t_to_ulong() shim because struct blame_scoreboard.final_buf_size is still unsigned long; that field is its own topic. log.c just widens its local from unsigned long to size_t. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. The struct field already receives its writes from a size_t-shaped source (xsize_t(st.st_size), strbuf.len, fill_textconv()'s return, odb_read_object_info_extended() via oi.sizep), so on Windows it was already truncating anything past 4 GiB silently on the strbuf and textconv paths and loudly through cast_size_t_to_ulong() on the odb path. Switch the field to size_t. In diff_populate_filespec(), point oi.sizep at the field directly and drop both cast_size_t_to_ulong() shims and the size_st bridge they fed. Downstream consumers that still read .size into unsigned long locals will now silently narrow on Windows where the field exceeds 4 GiB. Each of those is its own follow-up; the writer side is the prerequisite for ever putting a >4 GiB value in the field in the first place. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The two shims that 606c192 (odb, packfile: use size_t for streaming object sizes, 2026-05-08) and the subsequent odb_read_object() widening introduced as scaffolding around get_delta()'s reads can now disappear: the previous commit widened diff_delta() to size_t, which was the last narrow consumer in this function. Widen size and base_size to size_t outright, drop the size_st / base_size_st bridging temporaries, and drop the two cast_size_t_to_ulong() calls. Net change is 4 lines smaller and one read-then-cast indirection gone from each odb read. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Companion to the prior get_delta() cleanup, and the last try_delta() piece of the >4 GiB delta-path topic. Every consumer that the function's locals fed has now been widened: SIZE() / DELTA_SIZE() to size_t (prior topic), the mem_usage out-parameter and delta_cacheable() earlier in this series, and create_delta() / create_delta_index() in the immediately preceding commits. Widen the declaration of trg_size, src_size, sizediff, max_size and sz to size_t (delta_size joins them on the same line, removing the size_t delta_size line that the create_delta() widening commit added as a stop-gap), and drop the two sz_st bridge variables together with the surrounding cast_size_t_to_ulong() calls. The result is just "odb_read_object(&sz)" on both reads. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation that this series and the merged js/objects-larger-than-4gb-on-windows topic are advancing for >4 GiB objects on Windows: with the odb readers and the zlib helpers reached from do_compress() now widened end-to-end, the last cast_size_t_to_ulong() shim in this function can be removed, and do_compress() itself can carry the new size type through. Two cast_size_t_to_ulong() shims remain in this file; they feed the tree-walk API, which is still narrow and is a separate widening topic. write_no_reuse_object()'s return type and the hashfile API are still narrow but unchanged in observable behaviour: on 64-bit Linux ulong coincides with size_t, and on Windows these were the narrow fenceposts the prior topics deliberately left in place. Their widening is left to follow-ups touching the hashfile API and the write_object() caller chain. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. final_buf_size is fed either from textconv_object()'s now-size_t out-parameter, from odb_read_object()'s size_t out-parameter (both bridged today through a final_buf_size_st local + cast_size_t_to_ulong()), or from o->file.size (mmfile_t, long). Widen the struct field, point both producers straight at it, and drop the bridge variable along with the cast. builtin/blame.c only reads the field for pointer arithmetic and comparisons, which promote cleanly. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. fast-import's helper gfi_unpack_entry() and the five size-handling sites that feed off it (store_object()'s deltalen, load_tree(), parse_from_existing(), the inline gfi_unpack_entry() caller in parse_objectish(), cat_blob(), and dereference()) all carry size_t-shaped values from the odb / unpack_entry() APIs through cast_size_t_to_ulong() bridges into unsigned long locals. With the producers (odb_read_object(), odb_read_object_peeled(), unpack_entry()) and the consumers it feeds (the zlib avail_in field from a prior commit, encode_in_pack_object_header()'s uintmax_t parameter, parse_from_commit()'s widened size parameter) all size_t-ready, the bridges and casts go away in one pass. gfi_unpack_entry() now writes into the caller's size_t directly, and the six locals collapse to plain size_t declarations. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Tidies up the bridge variable introduced in the create_delta() / diff_delta() widening commit earlier in this series. With the test helper's local do_compress() also widened to size_t in pass, the narrowing into the unsigned long delta_size local that compress expected is gone, the size_st bridge is unnecessary, and the cast goes away. encode_in_pack_object_header() takes uintmax_t and hashwrite() takes uint32_t, both unchanged. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Now that all of the call sites of this helper (which I used as a kind of "NEEDSWORK" marker) are eliminated, we can drop that helper altogether. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…r-windows#6288) This is a small documentation improvement to `AGENTS.md`. The current "Building and Testing" section only shows `make -j15` "in a Git for Windows SDK shell" and says nothing about how to drive the build when you are not sitting in an interactive SDK shell, for example from PowerShell or from an automation agent. These are two things that are easy to get wrong in that situation, so let's write them down. The first is that a login shell is the wrong tool: `bash -l` / `bash --login` re-runs the profile scripts and is unnecessary once `MSYSTEM` and `PATH` are set explicitly. Setting `MSYSTEM=MINGW64` and prepending the SDK's `mingw64\bin` and `usr\bin` directories to `PATH`, then invoking a non-login `bash -c`, is enough to get a working build environment. The second is that when the optional Rust component fails to link (`cannot find target/release/libgitcore.a`), passing `NO_RUST=1` skips the cargo step. This is expressed as a `fixup!` for the commit that introduced `AGENTS.md`, so that it autosquashes into that commit during the next merging-rebase rather than adding a separate entry to the branch thicket.
…dows#6289) This PR contains a branch thicket on top of v2.55.0-rc1 (i.e. ready to go upstream) to continue the bulk of the `unsigned long` -> `size_t` transformation. Since all of these changes have no impact on the currently-working functionality for <4GB objects/packs/clones (modulo bugs, that is 😄), I would like to merge this before v2.55.0-rc2, still: The risk of introducing a regression is negligible, the chance for fixing the majority of problems with large clones is high.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Workflow run
Rebase Summary: next
From: 4b70e7683f (Continue improving support for 4GB+ packs/clones/objects (git-for-windows#6289), 2026-06-23) (b953e59134..4b70e7683f)
Resolved: f073cde8ad (Merge branch 'size-t/pack-objects-delta', 2026-06-07)
resolved size_t conflicts: took merge branch's direct &size passing in cat-file.c (size already size_t), kept HEAD's new functions in object-file.c, took merge branch's size_t widening in test helpers
Range-diff
1: f073cde8ad ! 1: 674a645 Merge branch 'size-t/pack-objects-delta'
@@ Metadata ## Commit message ## Merge branch 'size-t/pack-objects-delta' + + ## builtin/cat-file.c ## + remerge CONFLICT (content): Merge conflict in builtin/cat-file.c + index 8bea69befe..60869b8b37 100644 + --- builtin/cat-file.c + +++ builtin/cat-file.c +@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name) + if (odb_read_object_info_extended(the_repository->objects, &oid, &oi, flags) < 0) + die("git cat-file: could not get object info"); + +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) { +- size_t s = size; +- buf = replace_idents_using_mailmap(buf, &s); +- size = s; +- } +-======= + if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) + buf = replace_idents_using_mailmap(buf, &size); +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + + printf("%"PRIuMAX"\n", (uintmax_t)size); + ret = 0; +@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name) + if (!buf) + die("Cannot read object %s", obj_name); + +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- if (use_mailmap) { +- size_t s = size; +- buf = replace_idents_using_mailmap(buf, &s); +- size = s; +- } +-======= + if (use_mailmap) + buf = replace_idents_using_mailmap(buf, &size); +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + + /* otherwise just spit out the data */ + break; +@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name) + buf = odb_read_object_peeled(the_repository->objects, &oid, + exp_type_id, &size, NULL); + +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- if (use_mailmap) { +- size_t s = size; +- buf = replace_idents_using_mailmap(buf, &s); +- size = s; +- } +-======= + if (use_mailmap) + buf = replace_idents_using_mailmap(buf, &size); +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + break; + } + default: +@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d + if (!contents) + die("object %s disappeared", oid_to_hex(oid)); + +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- if (use_mailmap) { +- size_t s = size; +- contents = replace_idents_using_mailmap(contents, &s); +- size = s; +- } +-======= + if (use_mailmap) + contents = replace_idents_using_mailmap(contents, &size); +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + + if (type != data->type) + die("object %s changed type!?", oid_to_hex(oid)); +@@ builtin/cat-file.c: static void batch_object_write(const char *obj_name, + &data->type, &data->size); + if (!buf) + die(_("unable to read %s"), oid_to_hex(&data->oid)); +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- buf = replace_idents_using_mailmap(buf, &s); +- data->size = s; +-======= + buf = replace_idents_using_mailmap(buf, &data->size); +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + + free(buf); + } + + ## object-file.c ## + remerge CONFLICT (content): Merge conflict in object-file.c + index 6cb42932dd..6453b1d6fa 100644 + --- object-file.c + +++ object-file.c +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi) + return 0; + } + +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +-static int read_object_info_from_path(struct odb_source *source, +- const char *path, +- const struct object_id *oid, +- struct object_info *oi, +- enum object_info_flags flags) +-{ +- struct odb_source_files *files = odb_source_files_downcast(source); +- int ret; +- int fd; +- unsigned long mapsize; +- void *map = NULL; +- git_zstream stream, *stream_to_end = NULL; +- char hdr[MAX_HEADER_LEN]; +- size_t size_scratch; +- enum object_type type_scratch; +- struct stat st; +- +- /* +- * If we don't care about type or size, then we don't +- * need to look inside the object at all. Note that we +- * do not optimize out the stat call, even if the +- * caller doesn't care about the disk-size, since our +- * return value implicitly indicates whether the +- * object even exists. +- */ +- if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) { +- struct stat st; +- +- if ((!oi || (!oi->disk_sizep && !oi->mtimep)) && (flags & OBJECT_INFO_QUICK)) { +- ret = quick_has_loose(files->loose, oid) ? 0 : -1; +- goto out; +- } +- +- if (lstat(path, &st) < 0) { +- ret = -1; +- goto out; +- } +- +- if (oi) { +- if (oi->disk_sizep) +- *oi->disk_sizep = st.st_size; +- if (oi->mtimep) +- *oi->mtimep = st.st_mtime; +- } +- +- ret = 0; +- goto out; +- } +- +- fd = git_open(path); +- if (fd < 0) { +- if (errno != ENOENT) +- error_errno(_("unable to open loose object %s"), oid_to_hex(oid)); +- ret = -1; +- goto out; +- } +- +- if (fstat(fd, &st)) { +- close(fd); +- ret = -1; +- goto out; +- } +- +- mapsize = xsize_t(st.st_size); +- if (!mapsize) { +- close(fd); +- ret = error(_("object file %s is empty"), path); +- goto out; +- } +- +- map = xmmap(NULL, mapsize, PROT_READ, MAP_PRIVATE, fd, 0); +- close(fd); +- if (!map) { +- ret = -1; +- goto out; +- } +- +- if (oi->disk_sizep) +- *oi->disk_sizep = mapsize; +- if (oi->mtimep) +- *oi->mtimep = st.st_mtime; +- +- stream_to_end = &stream; +- +- switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) { +- case ULHR_OK: +- if (!oi->sizep) +- oi->sizep = &size_scratch; +- if (!oi->typep) +- oi->typep = &type_scratch; +- +- if (parse_loose_header(hdr, oi) < 0) { +- ret = error(_("unable to parse %s header"), oid_to_hex(oid)); +- goto corrupt; +- } +- +- if (*oi->typep < 0) +- die(_("invalid object type")); +- +- if (oi->contentp) { +- *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid); +- if (!*oi->contentp) { +- ret = -1; +- goto corrupt; +- } +- } +- +- break; +- case ULHR_BAD: +- ret = error(_("unable to unpack %s header"), +- oid_to_hex(oid)); +- goto corrupt; +- case ULHR_TOO_LONG: +- ret = error(_("header for %s too long, exceeds %d bytes"), +- oid_to_hex(oid), MAX_HEADER_LEN); +- goto corrupt; +- } +- +- ret = 0; +- +-corrupt: +- if (ret && (flags & OBJECT_INFO_DIE_IF_CORRUPT)) +- die(_("loose object %s (stored in %s) is corrupt"), +- oid_to_hex(oid), path); +- +-out: +- if (stream_to_end) +- git_inflate_end(stream_to_end); +- if (map) +- munmap(map, mapsize); +- if (oi) { +- if (oi->sizep == &size_scratch) +- oi->sizep = NULL; +- if (oi->typep == &type_scratch) +- oi->typep = NULL; +- if (oi->delta_base_oid) +- oidclr(oi->delta_base_oid, source->odb->repo->hash_algo); +- if (!ret) +- oi->whence = OI_LOOSE; +- } +- +- return ret; +-} +- +-int odb_source_loose_read_object_info(struct odb_source *source, +- const struct object_id *oid, +- struct object_info *oi, +- enum object_info_flags flags) +-{ +- static struct strbuf buf = STRBUF_INIT; +- +- /* +- * The second read shouldn't cause new loose objects to show up, unless +- * there was a race condition with a secondary process. We don't care +- * about this case though, so we simply skip reading loose objects a +- * second time. +- */ +- if (flags & OBJECT_INFO_SECOND_READ) +- return -1; +- +- odb_loose_path(source, &buf, oid); +- return read_object_info_from_path(source, buf.buf, oid, oi, flags); +-} +- +-======= +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c, + const void *buf, size_t len, + struct object_id *oid, +@@ object-file.c: struct odb_transaction *odb_transaction_files_begin(struct odb_source *source) + + return &transaction->base; + } +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- +-struct odb_source_loose *odb_source_loose_new(struct odb_source *source) +-{ +- struct odb_source_loose *loose; +- CALLOC_ARRAY(loose, 1); +- loose->source = source; +- return loose; +-} +- +-void odb_source_loose_free(struct odb_source_loose *loose) +-{ +- if (!loose) +- return; +- odb_source_loose_clear_cache(loose); +- loose_object_map_clear(&loose->map); +- free(loose); +-} +- +-struct odb_loose_read_stream { +- struct odb_read_stream base; +- git_zstream z; +- enum { +- ODB_LOOSE_READ_STREAM_INUSE, +- ODB_LOOSE_READ_STREAM_DONE, +- ODB_LOOSE_READ_STREAM_ERROR, +- } z_state; +- void *mapped; +- unsigned long mapsize; +- char hdr[32]; +- int hdr_avail; +- int hdr_used; +-}; +- +-static ssize_t read_istream_loose(struct odb_read_stream *_st, char *buf, size_t sz) +-{ +- struct odb_loose_read_stream *st = +- container_of(_st, struct odb_loose_read_stream, base); +- size_t total_read = 0; +- +- switch (st->z_state) { +- case ODB_LOOSE_READ_STREAM_DONE: +- return 0; +- case ODB_LOOSE_READ_STREAM_ERROR: +- return -1; +- default: +- break; +- } +- +- if (st->hdr_used < st->hdr_avail) { +- size_t to_copy = st->hdr_avail - st->hdr_used; +- if (sz < to_copy) +- to_copy = sz; +- memcpy(buf, st->hdr + st->hdr_used, to_copy); +- st->hdr_used += to_copy; +- total_read += to_copy; +- } +- +- while (total_read < sz) { +- int status; +- +- st->z.next_out = (unsigned char *)buf + total_read; +- st->z.avail_out = sz - total_read; +- status = git_inflate(&st->z, Z_FINISH); +- +- total_read = st->z.next_out - (unsigned char *)buf; +- +- if (status == Z_STREAM_END) { +- git_inflate_end(&st->z); +- st->z_state = ODB_LOOSE_READ_STREAM_DONE; +- break; +- } +- if (status != Z_OK && (status != Z_BUF_ERROR || total_read < sz)) { +- git_inflate_end(&st->z); +- st->z_state = ODB_LOOSE_READ_STREAM_ERROR; +- return -1; +- } +- } +- return total_read; +-} +- +-static int close_istream_loose(struct odb_read_stream *_st) +-{ +- struct odb_loose_read_stream *st = +- container_of(_st, struct odb_loose_read_stream, base); +- +- if (st->z_state == ODB_LOOSE_READ_STREAM_INUSE) +- git_inflate_end(&st->z); +- munmap(st->mapped, st->mapsize); +- return 0; +-} +- +-int odb_source_loose_read_object_stream(struct odb_read_stream **out, +- struct odb_source *source, +- const struct object_id *oid) +-{ +- struct object_info oi = OBJECT_INFO_INIT; +- struct odb_loose_read_stream *st; +- unsigned long mapsize; +- void *mapped; +- +- mapped = odb_source_loose_map_object(source, oid, &mapsize); +- if (!mapped) +- return -1; +- +- /* +- * Note: we must allocate this structure early even though we may still +- * fail. This is because we need to initialize the zlib stream, and it +- * is not possible to copy the stream around after the fact because it +- * has self-referencing pointers. +- */ +- CALLOC_ARRAY(st, 1); +- +- switch (unpack_loose_header(&st->z, mapped, mapsize, st->hdr, +- sizeof(st->hdr))) { +- case ULHR_OK: +- break; +- case ULHR_BAD: +- case ULHR_TOO_LONG: +- goto error; +- } +- +- oi.sizep = &st->base.size; +- oi.typep = &st->base.type; +- +- if (parse_loose_header(st->hdr, &oi) < 0 || st->base.type < 0) +- goto error; +- +- st->mapped = mapped; +- st->mapsize = mapsize; +- st->hdr_used = strlen(st->hdr) + 1; +- st->hdr_avail = st->z.total_out; +- st->z_state = ODB_LOOSE_READ_STREAM_INUSE; +- st->base.close = close_istream_loose; +- st->base.read = read_istream_loose; +- +- *out = &st->base; +- +- return 0; +-error: +- git_inflate_end(&st->z); +- munmap(mapped, mapsize); +- free(st); +- return -1; +-} +-======= +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + + ## t/helper/test-delta.c ## + remerge CONFLICT (content): Merge conflict in t/helper/test-delta.c + index ad7a427bf5..d807afef75 100644 + --- t/helper/test-delta.c + +++ t/helper/test-delta.c +@@ t/helper/test-delta.c: int cmd__delta(int argc, const char **argv) + die_errno("unable to read '%s'", argv[3]); + + if (argv[1][1] == 'd') { +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- unsigned long delta_size; +-======= + size_t delta_size; +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + out_buf = diff_delta(from.buf, from.len, + data.buf, data.len, + &delta_size, 0); + + ## t/helper/test-pack-deltas.c ## + remerge CONFLICT (content): Merge conflict in t/helper/test-pack-deltas.c + index 5360cc9e6d..959705feca 100644 + --- t/helper/test-pack-deltas.c + +++ t/helper/test-pack-deltas.c +@@ t/helper/test-pack-deltas.c: static void write_ref_delta(struct hashfile *f, + { + unsigned char header[MAX_PACK_OBJECT_HEADER]; + unsigned long delta_size, compressed_size, hdrlen; +-<<<<<<< f3aeae983a (odb: use size_t for object_info.sizep and the size APIs) +- size_t size, base_size; +-======= + size_t size, base_size, delta_size_st = 0; +->>>>>>> c1d354114f (git-zlib: widen git_deflate_bound() to size_t) + enum object_type type; + void *base_buf, *delta_buf; + void *buf = odb_read_object(the_repository->objects,To: 32f31d4f48 (Continue improving support for 4GB+ packs/clones/objects (git-for-windows#6289), 2026-06-23) (457ab90ae5..32f31d4f48)
Statistics
Range-diff (click to expand)
^$false match at end of filegit-<command>for built-insCC = gcc--pic-executablegit addissue with NTFS junctions.git/branches/in the templatescontrib/subtreetesttargetwindows.appendAtomicallyETC_*for MSYS2 environmentsstrbuf_realpath()parse_interpreter()contrib/subtreetests in CI buildsgit.exeto be used instead of the "Git wrapper"errnois set correctly when socket operations failwindows.appendAtomicallyin more casesgit p4testsgit add <file>where <file> traverses an NTFS junction git#2504 from dscho/access-repo-via-junctiongit_terminal_promptwith more terminalsparse_interpreter()git#3165 from dscho/increase-allowed-length-of-interpreter-pathcontrib/subtreetest execution to CI builds git#3349 from vdye/feature/ci-subtree-testssymlinkattributesafe.directorygit#3791: Various fixes aroundsafe.directorygit-<command>s for built-ins (Skip linking the "dashed"git-<command>s for built-ins git#4252)mingw-w64-git(i.e. regular MSYS2 ecosystem) support (Add fullmingw-w64-git(i.e. regular MSYS2 ecosystem) support git#5971)C:\Program Files\Git\mingw64\bin\git.exegit#2506 from dscho/issue-2283remove_dir_recurse()(Don't traverse mount points inremove_dir_recurse()git#6151)git p4tests (ci(macos): skip thegit p4tests git#5954)core.longPathsif paths are too long to removeiconviconvis unavailable, usetest-helper --iconvbuiltin pwd -Wwhen availablecast_size_t_to_ulong()helper