feat(data-retention): granular PII redaction stages (input + block outputs)#5272
feat(data-retention): granular PII redaction stages (input + block outputs)#5272TheodoreSpeaks wants to merge 16 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Runtime: Workflow input is masked before execution when the input stage is on. Block outputs are masked in-flight (before compaction and downstream blocks), including buffer-only streaming so raw chunks are not forwarded when that stage is enabled; policy propagates to child workflows and agent memory writes. Input/block stages abort the run on Presidio failure ( Presidio / batching: New Settings & contracts: Data retention UI uses stage tabs, language-filtered entity grids, and confirm-on-remove for overrides; API/schema accept Reviewed by Cursor Bugbot for commit f0c71cc. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR adds granular PII redaction stages for workflow execution and logs. The main changes are:
Confidence Score: 4/5This is close, but the restored large-value path should be fixed before merging.
apps/sim/lib/workflows/executor/execution-core.ts
|
| Filename | Overview |
|---|---|
| apps/sim/lib/workflows/executor/execution-core.ts | Adds execution-time policy resolution and restore masking, but restored large-value refs can still bypass block-output redaction. |
| apps/sim/executor/execution/block-executor.ts | Masks block outputs before compaction and prevents raw streaming chunks from being forwarded when block-output redaction is enabled. |
| apps/sim/lib/logs/execution/pii-large-values.ts | Adds hydrate, mask, and re-store handling for offloaded values in the log redaction path. |
| apps/sim/lib/billing/retention.ts | Resolves per-stage PII policy with stored rules as the execution-time source of truth. |
Reviews (12): Last reviewed commit: "fix(data-retention): always apply logs p..." | Re-trigger Greptile
|
@greptile review |
|
@greptile review |
…nconditionally (no fail-open)
|
@greptile review |
|
@greptile review |
…redaction # Conflicts: # apps/sim/ee/data-retention/components/data-retention-settings.tsx
|
@greptile review |
|
@greptile review |
|
@greptile review |
… = off everywhere
…block-output redaction is off
|
@greptile review |
| // Limitation: this walks inline strings only — values offloaded to | ||
| // large-value storage are still refs here and are not re-masked. In the | ||
| // normal flow that is safe (a run with the stage on masks before offload); | ||
| // the gap is the narrow case of a run that offloaded a large value while | ||
| // the stage was OFF and is resumed after the stage is turned ON. |
There was a problem hiding this comment.
When block-output redaction is enabled after a workflow already offloaded large block outputs, this restore path only masks inline strings in the snapshot. The offloaded payloads stay behind large-value refs. On resume or run-from-block, downstream blocks can still read the raw restored payload, and log persistence can skip the large-value scrub because block-output redaction is now enabled. This leaves raw PII reachable from prior block outputs after the stage is turned on.
| abortSignal: ctx.abortSignal, | ||
| // Propagate in-flight block-output redaction into child workflows so | ||
| // nested blocks mask outputs too (recurses: each child forwards it). | ||
| piiBlockOutputRedaction: ctx.piiBlockOutputRedaction, |
There was a problem hiding this comment.
Child workflows skip input redaction
Medium Severity
The new workflow-input PII stage runs only in executeWorkflowCore on top-level processedInput. Nested child runs are started with a direct Executor and pass childWorkflowInput unchanged. Only the block-output policy is forwarded on the context, so when the input stage is on and block outputs are off, mapped or explicit child input can execute and produce downstream state without in-flight input masking.
Reviewed by Cursor Bugbot for commit 8f86d77. Configure here.
…logs (preserve redacted content)
|
@greptile review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6e9587a. Configure here.
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
Refs stay unmasked When a paused run or run-from-block snapshot contains a large-value ref that was created before block-output redaction was enabled, this call only masks inline strings. Large-value refs are treated as opaque by
redactObjectStrings, so the ref still points at the original offloaded bytes. The later warm-up step can materialize that raw value for downstream blocks, letting them read or send unredacted PII even though the block-output stage is enabled.
…n logs stage is on
|
@greptile review |
| snapshot.state.blockStates = await redactObjectStrings( | ||
| snapshot.state.blockStates, | ||
| blockOutputOpts | ||
| ) |
There was a problem hiding this comment.
This restore path still only masks inline strings. When a paused run or run-from-block snapshot contains a large-value ref created before block-output redaction was enabled, redactObjectStrings leaves the ref untouched. The later warm-up can materialize that original offloaded value for downstream blocks, so the resumed workflow can read raw PII even though block-output redaction is now enabled. This path needs to hydrate, mask, and re-store restored refs before downstream state can use them.


Summary
Type of Change
Testing
Tested manually. Unit tests for resolver back-compat,
redactObjectStrings+ failure modes, and the contract schema.bun run lint,check:api-validation:strict, andcheck:migrations origin/stagingall pass.Checklist