impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Date Thu, 16 Feb 2017 01:01:31 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage

Patch Set 1:


Did you verify the memory savings with a simple experiment?
File be/src/exprs/

Line 146: // We assume that the dst buffer has already been allocated earlier. This increases
can simplify with:

Same as AllocBuffer() above but for re-allocating 'dst->ptr'.

Line 150:   uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len);
DCHECK(dst->ptr != NULL);


Line 974:   ReservoirSample<T>* at(int64_t idx) {
single line, const function

Line 979:   void push_back(ReservoirSample<T> s) {
use const&, otherwise 's' is copied twice

Line 982:     ReservoirSample<T>* arr = reinterpret_cast<ReservoirSample<T>*>(begin());
no need to reinterpret_cast

Line 987:   ReservoirSample<T>* begin() {
const function (and elsewhere)

Line 994:   ReservoirSample<T>* end() {
single line

Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* buffer) {
Why not move this function into ReservoirSampleState and make ReservoirSampleState auto-resize
itself during push_back()? You can add ctx as a param to push_back.

That way all the logic is encapsulated inside ReservoirSampleState and not throughout various
agg fn implementations

Line 1007:   // can fit 10 times as many samples, up to a maximum of 20,000. We do not allocate
instead of 20,000 use the constant MAX_NUM_SAMPLES

Line 1033:   // If the array gets filled due to updates or merges, we will reallocate a larger
suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_SAMPLES ReservoirSamples."

Line 1035:   int init_size = (NUM_SAMPLES / 1000);
It would be nice to have this encapsulated in the ReservoirSampleState as well.

Line 1061:       resizeReservoirSampleState<T>(ctx, dst);
if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may cause a crash

I suggest moving the reallocation logic into the push_back() of ReservoirSampleState. You
might use the return code of push_back() to signal success/failure, so you can handle the
failure case here.

Line 1147:   DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState<T>) +
you can add a helper function to ReservoirSampleState to do this size computation

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-HasComments: Yes

View raw message