impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Date Fri, 24 Feb 2017 04:02:48 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
......................................................................


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6025/8//COMMIT_MSG
Commit Message:

PS8, Line 22: No new tests were added,
> can update this now
Done


http://gerrit.cloudera.org:8080/#/c/6025/8/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 126: // TODO: this file should be cross compiled and then all of the builtin
> is this done?
This file ends with a "-ir" and it's in be/src/codegen/impala-ir.cc, so this must be done.
Removed.


Line 961: struct ReservoirSampleState {
> i'd say this really turned into a class
Done


Line 967:   // resize occurs, this needs to be updated from the outside.
> what does 'from the outside' mean?
I meant that whoever resizes and memcopies this struct over, they are also responsible for
updating the capacity. It might be more clear if I remove that part.


Line 977:   ReservoirSampleState(int init_capacity) :
> use standard formatting
Done.


Line 1016:     // The array of ReservoirSamples starts right after ReservoirSampleState, so
we use
> that's often done by putting an array of size 1 at the end of the header st
Done. Also made some of the functions non-const because we don't want a function like GetSample()
to return a const ReservoirSample<T>*.


Line 1025:   int64_t GetNext64(int64_t max) {
> while you're at it, this deserves a comment
Done


Line 1033:   // Given a buffer that contains a ReservoirSampleState, resize the buffer so
that it's
> its
nice catch


Line 1040:   if (new_capacity * 2 >= MAX_CAPACITY) new_capacity = MAX_CAPACITY;
> if state->capacity is 10 and max_capacity is 40, this line sets new_capacit
With the current constants, we would be resizing from about 8000 to 20,000. I think this is
acceptable. Would it be better to resize to 16,000 then to 20,000?


Line 1062:   // If the array gets filled due to updates or merges, we reallocate a larger
buffer to
> you should put this (= a brief description of what you're doing) somewhere 
I moved this description higher up. I kept it mostly unmodified.


http://gerrit.cloudera.org:8080/#/c/6025/8/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

PS8, Line 1163: mediam
> spelling
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message