feat(storage): Add full object checksum validation for appendable uploads#16219
feat(storage): Add full object checksum validation for appendable uploads#16219v-pratap wants to merge 11 commits into
Conversation
3aad347 to
915cef5
Compare
There was a problem hiding this comment.
Code Review
This pull request adds support for CRC32C checksum validation during asynchronous uploads, specifically during finalization, resume, and flush operations. It updates the AsyncWriter interface and its underlying connection implementations to accept and verify expected checksums, and enhances HashFunction to support state retrieval and restoration. The review feedback highlights three critical issues: a compilation error in AsyncWriterConnectionImpl::Finalize caused by inconsistent return types in a lambda, and two data races in AsyncWriterConnectionResumed where impl_ is accessed concurrently without holding the appropriate mutex lock.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #16219 +/- ##
==========================================
- Coverage 92.23% 92.22% -0.02%
==========================================
Files 2265 2265
Lines 210210 210519 +309
==========================================
+ Hits 193894 194155 +261
- Misses 16316 16364 +48 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
… in appendable uploads
…overloads without default arguments
…ck slice calculations
006eeda to
9fca72d
Compare
| std::unique_lock<std::mutex> lk(self->mu_); | ||
| auto checksums = self->Impl(lk)->PersistedChecksums(); | ||
| if (checksums && checksums->has_crc32c()) { | ||
| self->RestoreChecksumState(persisted_size); |
There was a problem hiding this comment.
This call would rewind the minimum_offset_ of the main hash_function_ to persisted_size using RestoreCrc32c but AFAICT, we do not update the write_offset_ (since we are not resuming the stream). When the client sends the next chunk, it calls hash_function_->Update(offset, ...) using the old offset. while the hash function expects persisted_size.
There was a problem hiding this comment.
Rewinding the main hash function during an active flush breaks ongoing writes. To fix this, I moved the map check into a helper (EnsureCrc32cHistory) that calculates and records the CRC at persisted_size without touching or rewinding the active hash function's offset. We now only rewind the main hash function during an actual stream resume (OnQuery).
There was a problem hiding this comment.
Thanks for adding the new helper.
OnQuery is not exclusive to stream resumes, it is also called inside FlushStep, and inside OnQuery, RestoreChecksumState is called unconditionally at the very beginning. If the server returns a persisted_size that lags behind what the client sent, we would rewind the main hash function. The next write on the stream will fail with a mismatched offset.
Inside OnQuery, should we call RestoreChecksumState after checking if (state_ == State::kResuming)?
| } | ||
|
|
||
| if (crc32c && md5) { | ||
| return std::make_shared<storage::internal::CompositeFunction>( |
There was a problem hiding this comment.
MD5HashFunction does not implement RestoreCrc32c, which is expected. What happens when RestoreChecksumState rewinds the offset for CRC32C and then attempts to advance the hash using hash_function_->Update(y, ...)? Will MD5HashFunction::Update throw an InvalidArgumentError because its offset was never rewound?
There was a problem hiding this comment.
Exactly, since MD5 can't be rewound, calling .Update() on the combined hash function would cause MD5 to fail. Our new helper solves this by extending the CRC32C value directly (using ExtendCrc32c), so we never call .Update() or touch MD5 when checking unaligned boundaries.
There was a problem hiding this comment.
IIUC, MD5 will still crash when the client resumes writing. Since MD5HashFunction does not implement RestoreCrc32c, the MD5 minimum_offset_ remains at its old value.
Potential solution: when resuming an upload that uses MD5, we recreate the CompositeFunction from scratch using the server's persisted CRC32C, rather than passing the same mutated hash_function_ down.
Consider verifying current behavior using a unit test using the composite hash before making changes to the implementation.
…tate to avoid offset rewinding and MD5 errors
No description provided.