bug: clear map contents after emit in GroupValuesColumns#23262
bug: clear map contents after emit in GroupValuesColumns#23262Rich-T-kid wants to merge 3 commits into
Conversation
|
From my understanding spilling causes |
| let fresh = Self::build_group_columns(&self.schema)?; | ||
| let group_values = mem::replace(&mut self.group_values, fresh); | ||
|
|
||
| self.map.clear(); |
There was a problem hiding this comment.
Can you please add a comment here?
What is the end user implication of this change? less peak memory usage?
There was a problem hiding this comment.
Not exactly,the map's capacity is preserved. This gives the caller a correctness guarantee: emit will never leave GroupValuesColumn in an intermediate or inconsistent state.
There was a problem hiding this comment.
Can emit ever be called more than once with EmitTo::All ? That seems like it would be a bug right there
There was a problem hiding this comment.
EmitTo::All gets called in two cases
-
memory is filling up, leading to a spill.
EmitTo::Allis called here and then its written to disk. afterEmitTo::Allis calledGroupValues::clear_shrink()gets called and thats when the map content gets cleared. iircEmitTo::Allcan be called multiple times for spills. This wont matter as each call is paired withGroupValues::clear_shrink() -
all input has been seen, begin producing output.
EmitTo::allis called here is called to produced all of the arrays and then gets repeatedly sliced. After this theexec_stateis set to producing output so intern is never called again.
Currently in both cases due to the ordering of surrounding code the state of groupValuesColumns is logically correct.
🤔 It may not be worth adding this map.clear() in this case. My thought was to try to prevent possible invariant violations, but if clear_shrink(0) isn't called, or if more input comes in after the state changes to producing_output, there may be larger logical inconsistencies.
| let fresh = Self::build_group_columns(&self.schema)?; | ||
| let group_values = mem::replace(&mut self.group_values, fresh); | ||
|
|
||
| self.map.clear(); |
There was a problem hiding this comment.
Can emit ever be called more than once with EmitTo::All ? That seems like it would be a bug right there
Which issue does this PR close?
Rationale for this change
I was implementing dictionary support for this trait when I noticed the
emit(emit_to::All)path doesn't update the contents of the the hashtable/mapWhat changes are included in this PR?
This PR adds test to catch future regressions as well as a simple
self.map.clear()to clear the contents of the map on emitAre these changes tested?
yes, this PR adds two test
Are there any user-facing changes?
no