asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: ASTERIXDB-1628: Fixed an issue in External Hash Group by
Date Sat, 03 Sep 2016 01:58:33 GMT
Taewoo Kim has uploaded a new change for review.

  https://asterix-gerrit.ics.uci.edu/1144

Change subject: ASTERIXDB-1628: Fixed an issue in External Hash Group by
......................................................................

ASTERIXDB-1628: Fixed an issue in External Hash Group by

 - The number of partitions in External Hash Group By is now
   properly calculated by considering a corner case.

Change-Id: I8901d2b64659fb0d2b97d73f45a9fe113232e860
---
M hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
1 file changed, 11 insertions(+), 7 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/44/1144/1

diff --git a/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
b/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
index f08d27d..85a7609 100644
--- a/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
+++ b/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/HashSpillableTableFactory.java
@@ -238,17 +238,21 @@
         };
     }
 
-    private int getNumOfPartitions(int nubmerOfFramesForData, int frameLimit) {
-        if (frameLimit > nubmerOfFramesForData) {
+    private int getNumOfPartitions(int nubmerOfFramesForDataAndHashTable, int frameLimit)
{
+        if (frameLimit >= nubmerOfFramesForDataAndHashTable * FUDGE_FACTOR) {
             return 1; // all in memory, we will create a big partition
         }
+        // The formula is based on Shapiro's paper - http://cs.stanford.edu/people/chrismre/cs345/rl/shapiro.pdf.
+        // Check the page 249 for more details.
         int numberOfPartitions = (int) (Math
-                .ceil((nubmerOfFramesForData * FUDGE_FACTOR - frameLimit) / (frameLimit -
1)));
-        if (numberOfPartitions <= 0) {
-            numberOfPartitions = 1; //becomes in-memory hash
-        }
+                .ceil((nubmerOfFramesForDataAndHashTable * FUDGE_FACTOR - frameLimit) / (frameLimit
- 1)));
+        // Actually, at this stage, we know that this is not a in-memory hash (#frames required
> #frameLimit).
+        // So we want to guarantee that the number of partition is at least two because there
is a corner case.
+        numberOfPartitions = Math.max(2, numberOfPartitions);
+        // If the number of partitions is greater than the memory budget, there might be
a case that we can't
+        // allocate at least one frame for each partition in memory. So, we deal with those
cases here.
         if (numberOfPartitions > frameLimit) {
-            numberOfPartitions = (int) Math.ceil(Math.sqrt(nubmerOfFramesForData * FUDGE_FACTOR));
+            numberOfPartitions = (int) Math.ceil(Math.sqrt(nubmerOfFramesForDataAndHashTable
* FUDGE_FACTOR));
             return Math.max(2, Math.min(numberOfPartitions, frameLimit));
         }
         return numberOfPartitions;

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1144
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8901d2b64659fb0d2b97d73f45a9fe113232e860
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangsaeu@yahoo.com>

Mime
View raw message