impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Date Thu, 16 Feb 2017 04:21:23 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6025/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 146: // We assume that the dst buffer has already been allocated earlier. This increases
> can simplify with:
Done


Line 150:   uint8_t* ptr = ctx->Reallocate(dst->ptr, buf_len);
> DCHECK(dst->ptr != NULL);
Done


Line 918: const static int NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET;
> NUM_SAMPLES -> MAX_NUM_SAMPLES
Done


Line 974:   ReservoirSample<T>* at(int64_t idx) {
> single line, const function
Done


Line 979:   void push_back(ReservoirSample<T> s) {
> use const&, otherwise 's' is copied twice
Done


Line 982:     ReservoirSample<T>* arr = reinterpret_cast<ReservoirSample<T>*>(begin());
> no need to reinterpret_cast
Done


Line 987:   ReservoirSample<T>* begin() {
> const function (and elsewhere)
Done


Line 994:   ReservoirSample<T>* end() {
> single line
Done


Line 1005: void resizeReservoirSampleState(FunctionContext* ctx, StringVal* buffer) {
> Why not move this function into ReservoirSampleState and make ReservoirSamp
Unfortunately it's not possible to encapsulate because we also need to update the pointer
in the StringVal buffer. It would be more complicated and error prone to pass it as one of
parameters into this function.


Line 1007:   // can fit 10 times as many samples, up to a maximum of 20,000. We do not allocate
> instead of 20,000 use the constant MAX_NUM_SAMPLES
Done


Line 1033:   // If the array gets filled due to updates or merges, we will reallocate a larger
> suggest "we reallocate a larger buffer to hold up to a maximum of MAX_NUM_S
Done


Line 1035:   int init_size = (NUM_SAMPLES / 1000);
> It would be nice to have this encapsulated in the ReservoirSampleState as w
As mentioned above, it's not possible to encapsulate cleanly.


Line 1061:       resizeReservoirSampleState<T>(ctx, dst);
> if ReallocBuffer() failed, dst->ptr will be in an undefined state, and may 
I just added a check here, because it's not possible to move the logic as you suggested.


Line 1147:   DCHECK_EQ(src_val.len, sizeof(ReservoirSampleState<T>) +
> you can add a helper function to ReservoirSampleState to do this size compu
Done. Also added IsFull().


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message