asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng Jia (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: ASTERIXDB-1566: Fixed External Hash Group By to conform to t...
Date Tue, 16 Aug 2016 07:43:28 GMT
Jianfeng Jia has posted comments on this change.

Change subject: ASTERIXDB-1566: Fixed External Hash Group By to conform to the memory budget
......................................................................


Patch Set 5:

(14 comments)

https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml
File asterixdb/asterix-app/src/main/resources/asterix-build-configuration.xml:

Line 71:     <value>64KB</value>
should the user care about the internal hashtable memory limit?
I thought it could be automatically changed.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java
File asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/AsterixCompilerProperties.java:

Line 44:     private static final long COMPILER_EXTERNAL_GROUP_TABLE_SIZE_DEFAULT = 16 <<
20; // 16MB
just want to clarify, is this the cardinality value? or the physical bytes?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/asterixdb/asterix-docker/docker/asterix-configuration.xml
File asterixdb/asterix-docker/docker/asterix-configuration.xml:

Line 239:     <name>compiler.groupmemory</name>
same as Chen's thought, I think the `groupbymemory` would be more meaningful.


Line 250:       In an external hash group-by operation, a hash table is used to insert/traverse
the group-by value
If I were the user, I would not read a description like this long :-)


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/IPartitionedTupleBufferManager.java:

Line 30:     // Return the number of entire frames that are allocated to this buffer manager.
use JavaDoc for Interface?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java:

Line 52:         // If we couldn't find an already spilled partition, or it is too small to
flush that one,
personal opition: a large chunk of comments inside the code breaks the code readability. It's
will be very hard for readers to navigate the actual code. 

It's better to move these two chunks on top of the function as a function description.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/VPartitionTupleBufferManager.java:

Line 50:     // total number of frames that are allocated
code tells the comment.


Line 175:         } else {
don't need `else`?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java:

Line 68:         int expectedByteSizeOfHashTableForGroupBy = SerializableHashTable.getUnitSize()
should it be convenient to have a corresponding function inside the `SerializableHashTable`?


Line 179:                 if (isUsedNumFramesExceedBudget()) {
it's a ideal time to check :-)


Line 254:                 if (getNumFrames() <= framesLimit) {
`return getNumFrames() > framesLimit` ?


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalHashGroupBy.java:

Line 64:                         // If the result type is SUCCESS_BUT_EXCEEDS_BUDGET,
Again, the large chunk of comment breaks the code flow. It's better to merge them into as
one paragraph and add it on top the function.


Line 74:                         // FAIL case - try to insert again
code tells.


https://asterix-gerrit.ics.uci.edu/#/c/1056/5/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java
File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java:

Line 179:         // With the above table size, there can be 1001 * 40 / 256 (= 156) number
of header frames in Hash Table.
this is the function description.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1056
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Gerrit-PatchSet: 5
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangsaeu@yahoo.com>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng.jia@gmail.com>
Gerrit-Reviewer: Till Westmann <tillw@apache.org>
Gerrit-Reviewer: Wenhai Li <lwhaymail@yahoo.com>
Gerrit-Reviewer: Yingyi Bu <buyingyi@gmail.com>
Gerrit-HasComments: Yes

Mime
View raw message