asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: ASTERIXDB-1566, ASTERIXDB-1733: Hash Group By and Hash Join ...
Date Wed, 04 Jan 2017 00:18:28 GMT
Taewoo Kim has posted comments on this change.

Change subject: ASTERIXDB-1566, ASTERIXDB-1733: Hash Group By and Hash Join conform to the
memory budget
......................................................................


Patch Set 60:

(17 comments)

Thanks Yingyi.

https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java
File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/ExternalGroupByPOperator.java:

Line 330:         BigInteger groupByPrimeNumberTableSize = BigInteger.valueOf(finalGroupByTableCardinality).nextProbablePrime();
> nextProbablePrime is very expensive.  I tried on my local machine and it us
It was Mike's idea that I agreed. I just talked to Mike and he agreed. This is removed.


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java:

Line 60:     private static final int MIN_OUTPUT_FRAME_LIMT = 1;
> MIN_OUTPUT_FRAME_LIMIT =1
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupOperatorDescriptor.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/external/ExternalGroupOperatorDescriptor.java:

Line 70:              * Minimum of 4 frames: 1 for input records, 1 for output, and 2 for
hash table (1 header and 1 content)
> If minimum is 4, the condition should frameLimit <= 3.  Correct?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/InMemoryHashJoin.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/InMemoryHashJoin.java:

Line 132:                         "Can't insert an entry into hash table. Please assign more
memory to InMemoryHashJoin.");
> The error message is a bit hard to follow.
As we both know, the old contract works because it ignores the hash table footprint in memory.
The contract is not changed. I just added one more safety check here.


Line 211:         LOGGER.fine("InMemoryHashJoin has finished using " + nFrames + " frames
for Thread ID "
> Check logging level so that we don't always do string concatenation.
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/join/OptimizedHybridHashJoin.java:

Line 310:         if (freeSpace <= 0) {
> <=0  -->  <0 ?
Done


Line 336:                     long expectedHashTableSizeDecrease = hashTableByteSizeForInMemTuples
> you can calculate the expectedHashTableSizeDecrease without inMemTupCount.
Since ceil() is used to calculate the total number of frames that are required for the given
hash table, slightly changing the number can affects the number of frames.


Line 342:                     if (freeSpace > 0) {
> freeSpace > 0  -->  freeSpace >= 0 ?
Done


Line 405:     private int selectPartitionsToReload(long freeSpace, int pid, int inMemTupCount,
long originalHashTableSize) {
> Why do you need 
Since ceil() is used to calculate the total number of frames that are required for the given
hash table, slightly changing the number can affects the number of frames.


Line 412:             if (freeSpace > buildRFWriters[i].getFileSize() + expectedHashTableSizeIncrease)
{
> > -->  >= ?
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SimpleSerializableHashTable.java
File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/structures/SimpleSerializableHashTable.java:

Line 490:             bytes[offset++] = (byte) (value);
> Address this?  Remove the last ++.
Done


Line 490:             bytes[offset++] = (byte) (value);
> MAJOR SonarQube violation:
Done


https://asterix-gerrit.ics.uci.edu/#/c/1056/60/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java
File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/unit/AbstractExternalGroupbyTest.java:

Line 180:         int frameSize = 256;
> Is 50 too large?
Header: 4 pages, Content: 101 * 40 = 4040 = 15.78 = 16 pages
Changed it to 23.


Line 189:         int tableSize = 101;
> Is 50 too large?
Done


Line 192:         int minDataSize = frameSize * 80;
> Is 80 too large?
Adjusted to 40 after changing the frame count to 23.


Line 202:         int frameSize = 256;
> Is 50 too large?
Done


Line 203:         int minDataSize = frameSize * 80;
> Is 80 too large?
Adjusted to 40 after changing the frame count to 23.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1056
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Gerrit-PatchSet: 60
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangsaeu@yahoo.com>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng.jia@gmail.com>
Gerrit-Reviewer: Michael Blow <mblow@apache.org>
Gerrit-Reviewer: Steven Jacobs <sjaco002@ucr.edu>
Gerrit-Reviewer: Taewoo Kim <wangsaeu@yahoo.com>
Gerrit-Reviewer: Till Westmann <tillw@apache.org>
Gerrit-Reviewer: Wenhai Li <lwhaymail@yahoo.com>
Gerrit-Reviewer: Yingyi Bu <buyingyi@gmail.com>
Gerrit-HasComments: Yes

Mime
View raw message