Skip to content

feat(storage): Add full object checksum validation for appendable uploads#16219

Open
v-pratap wants to merge 11 commits into
googleapis:mainfrom
v-pratap:fastbyte-full-object-checksum-final
Open

feat(storage): Add full object checksum validation for appendable uploads#16219
v-pratap wants to merge 11 commits into
googleapis:mainfrom
v-pratap:fastbyte-full-object-checksum-final

Conversation

@v-pratap

Copy link
Copy Markdown
Contributor

No description provided.

@v-pratap v-pratap requested review from a team as code owners June 30, 2026 06:34
@product-auto-label product-auto-label Bot added the api: storage Issues related to the Cloud Storage API. label Jun 30, 2026
@v-pratap v-pratap requested a review from kalragauri June 30, 2026 06:34
@v-pratap v-pratap force-pushed the fastbyte-full-object-checksum-final branch from 3aad347 to 915cef5 Compare June 30, 2026 06:35

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for CRC32C checksum validation during asynchronous uploads, specifically during finalization, resume, and flush operations. It updates the AsyncWriter interface and its underlying connection implementations to accept and verify expected checksums, and enhances HashFunction to support state retrieval and restoration. The review feedback highlights three critical issues: a compilation error in AsyncWriterConnectionImpl::Finalize caused by inconsistent return types in a lambda, and two data races in AsyncWriterConnectionResumed where impl_ is accessed concurrently without holding the appropriate mutex lock.

Comment thread google/cloud/storage/internal/async/writer_connection_impl.cc
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc
@v-pratap v-pratap added the do not review Indicates a PR is not ready for review label Jun 30, 2026
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.66762% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.22%. Comparing base (559b727) to head (811bb66).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...torage/internal/async/writer_connection_resumed.cc 75.25% 24 Missing ⚠️
...le/cloud/storage/internal/async/connection_impl.cc 62.06% 11 Missing ⚠️
google/cloud/storage/internal/hash_function_impl.h 42.85% 8 Missing ⚠️
...orage/internal/async/writer_connection_buffered.cc 54.54% 5 Missing ⚠️
google/cloud/storage/internal/hash_function.h 0.00% 4 Missing ⚠️
google/cloud/storage/async/writer.cc 71.42% 2 Missing ⚠️
...torage/internal/async/writer_connection_tracing.cc 60.00% 2 Missing ⚠️
...e/internal/async/writer_connection_resumed_test.cc 98.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16219      +/-   ##
==========================================
- Coverage   92.23%   92.22%   -0.02%     
==========================================
  Files        2265     2265              
  Lines      210210   210519     +309     
==========================================
+ Hits       193894   194155     +261     
- Misses      16316    16364      +48     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@v-pratap v-pratap removed the do not review Indicates a PR is not ready for review label Jun 30, 2026
Comment thread google/cloud/storage/internal/async/writer_connection_impl.cc Outdated
@v-pratap v-pratap force-pushed the fastbyte-full-object-checksum-final branch from 006eeda to 9fca72d Compare July 1, 2026 13:35
@v-pratap v-pratap requested a review from kalragauri July 1, 2026 13:47
Comment thread google/cloud/storage/internal/async/writer_connection_impl.cc Outdated
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
@v-pratap v-pratap requested a review from kalragauri July 2, 2026 12:17
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
Comment thread google/cloud/storage/internal/async/writer_connection_resumed.cc Outdated
std::unique_lock<std::mutex> lk(self->mu_);
auto checksums = self->Impl(lk)->PersistedChecksums();
if (checksums && checksums->has_crc32c()) {
self->RestoreChecksumState(persisted_size);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call would rewind the minimum_offset_ of the main hash_function_ to persisted_size using RestoreCrc32c but AFAICT, we do not update the write_offset_ (since we are not resuming the stream). When the client sends the next chunk, it calls hash_function_->Update(offset, ...) using the old offset. while the hash function expects persisted_size.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewinding the main hash function during an active flush breaks ongoing writes. To fix this, I moved the map check into a helper (EnsureCrc32cHistory) that calculates and records the CRC at persisted_size without touching or rewinding the active hash function's offset. We now only rewind the main hash function during an actual stream resume (OnQuery).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the new helper.

OnQuery is not exclusive to stream resumes, it is also called inside FlushStep, and inside OnQuery, RestoreChecksumState is called unconditionally at the very beginning. If the server returns a persisted_size that lags behind what the client sent, we would rewind the main hash function. The next write on the stream will fail with a mismatched offset.

Inside OnQuery, should we call RestoreChecksumState after checking if (state_ == State::kResuming)?

}

if (crc32c && md5) {
return std::make_shared<storage::internal::CompositeFunction>(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD5HashFunction does not implement RestoreCrc32c, which is expected. What happens when RestoreChecksumState rewinds the offset for CRC32C and then attempts to advance the hash using hash_function_->Update(y, ...)? Will MD5HashFunction::Update throw an InvalidArgumentError because its offset was never rewound?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, since MD5 can't be rewound, calling .Update() on the combined hash function would cause MD5 to fail. Our new helper solves this by extending the CRC32C value directly (using ExtendCrc32c), so we never call .Update() or touch MD5 when checking unaligned boundaries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, MD5 will still crash when the client resumes writing. Since MD5HashFunction does not implement RestoreCrc32c, the MD5 minimum_offset_ remains at its old value.

Potential solution: when resuming an upload that uses MD5, we recreate the CompositeFunction from scratch using the server's persisted CRC32C, rather than passing the same mutated hash_function_ down.

Consider verifying current behavior using a unit test using the composite hash before making changes to the implementation.

…tate to avoid offset rewinding and MD5 errors
@v-pratap v-pratap requested a review from kalragauri July 3, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants