asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Fwd: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.
Date Tue, 22 Nov 2016 05:40:00 GMT
+1

On Sat, Nov 19, 2016 at 12:32 PM, Mike Carey <dtabass@gmail.com> wrote:

> +1 for removal of now-defunct operations!
>
>
>
> On 11/19/16 12:01 PM, Taewoo Kim wrote:
>
>> Hi all,
>>
>> Please share your thought on this issue. In short, Grace Hash Join and
>> Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash
>> Join. Therefore, I think it would be better to remove them.
>> https://issues.apache.org/jira/browse/ASTERIXDB-1736
>> <https://issues.apache.org/jira/browse/ASTERIXDB-1736>
>> ---------- Forwarded message ----------
>> From: Taewoo Kim (JIRA) <jira@apache.org>
>> Date: Fri, Nov 18, 2016 at 5:06 PM
>> Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
>> Join are not being used.
>> To: notifications@asterixdb.incubator.apache.org
>>
>>
>> Taewoo Kim created ASTERIXDB-1736:
>> -------------------------------------
>>
>>               Summary: Grace Hash Join and Hybrid Hash Join are not being
>> used.
>>                   Key: ASTERIXDB-1736
>>                   URL: https://issues.apache.org/jira
>> /browse/ASTERIXDB-1736
>>               Project: Apache AsterixDB
>>            Issue Type: Improvement
>>              Reporter: Taewoo Kim
>>              Assignee: Taewoo Kim
>>
>>
>> As the title says, Grace Hash Join and Hybrid Hash Join are not being
>> used.
>> I suggest that we remove these two join methods. Here are my findings for
>> these two joins.
>>
>> 1) Grace Hash Join
>> GraceHashJoinOperatorDescriptor is only called from two places:
>> org.apache.hyracks.examples.tpch.client.join and
>> TPCHCustomerOrderHashJoinTest.
>> One is a Hyracks example (tpch.client) and the other is a unit test. This
>> join is not used currently (not chosen during the compilation).
>>
>> 2) Hybrid Hash Join
>> During the compilation, the optimizer decides whether it will use Hybrid
>> Hash Join or Optimized Hybrid Hash Join.
>> If the hash function family for each key variable is set, then we use the
>> optimized hybrid hash join.
>> If not, we use the hybrid hash join. However, in fact, this path - hybrid
>> hash join path will never be chosen. Let's check the code.
>>
>> {code:title=HybridHashJoinPOperator.java|borderStyle=solid}
>>          IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.
>> variablesToBinaryHashFunctionFamilies(keysLeftBranch,
>>                  env, context);
>>
>>          ...
>>
>>          boolean optimizedHashJoin = true;
>>          for (IBinaryHashFunctionFamily family : hashFunFamilies) {
>>              if (family == null) {
>>                  optimizedHashJoin = false;
>>                  break;
>>              }
>>          }
>>
>>          if (optimizedHashJoin) {
>>              opDesc = generateOptimizedHashJoinRuntime(context,
>> inputSchemas, keysLeft, keysRight, hashFunFamilies,
>>                      comparatorFactories, predEvaluatorFactory,
>> recDescriptor, spec);
>>          } else {
>>              opDesc = generateHashJoinRuntime(context, inputSchemas,
>> keysLeft, keysRight, hashFunFactories,
>>                      comparatorFactories, predEvaluatorFactory,
>> recDescriptor, spec);
>>          }
>> {code}
>>
>> As we can see, optimizedHashJoin is set to false only when the hash family
>> is null.
>> Then, how do we assign the hashfamily for each key variable?
>>
>> {code:title=JobGenHelper.java|borderStyle=solid}
>>      public static IBinaryHashFunctionFamily[]
>> variablesToBinaryHashFunctionF
>> amilies(
>>              Collection<LogicalVariable> varLogical,
>> IVariableTypeEnvironment env, JobGenContext context)
>>                      throws AlgebricksException {
>>          IBinaryHashFunctionFamily[] funFamilies = new
>> IBinaryHashFunctionFamily[varLogical.size()];
>>          int i = 0;
>>          IBinaryHashFunctionFamilyProvider bhffProvider = context.
>> getBinaryHashFunctionFamilyProvider();
>>          for (LogicalVariable var : varLogical) {
>>              Object type = env.getVarType(var);
>>              funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(
>> type);
>>          }
>>          return funFamilies;
>>      }
>> {code}
>>
>> For each variable type, we try to get hash function family. In the current
>> codebase, AqlBinaryHashFunctionFamilyProvider is the only class that
>> implements IBinaryHashFunctionFamilyProvider.
>> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
>> So, there is no way that the hash function family is null.
>>
>> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
>> public class AqlBinaryHashFunctionFamilyProvider implements
>> IBinaryHashFunctionFamilyProvider, Serializable {
>>
>>      private static final long serialVersionUID = 1L;
>>      public static final AqlBinaryHashFunctionFamilyProvider INSTANCE =
>> new
>> AqlBinaryHashFunctionFamilyProvider();
>>
>>      private AqlBinaryHashFunctionFamilyProvider() {
>>
>>      }
>>
>>      @Override
>>      public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object
>> type) throws AlgebricksException {
>>          // AMurmurHash3BinaryHashFunctionFamily converts numeric type to
>> double type before doing hash()
>>          return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
>>      }
>>
>> }
>> {code}
>>
>>
>>
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message