asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Fwd: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.
Date Sat, 19 Nov 2016 20:32:04 GMT
+1 for removal of now-defunct operations!


On 11/19/16 12:01 PM, Taewoo Kim wrote:
> Hi all,
>
> Please share your thought on this issue. In short, Grace Hash Join and
> Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash
> Join. Therefore, I think it would be better to remove them.
> https://issues.apache.org/jira/browse/ASTERIXDB-1736
> <https://issues.apache.org/jira/browse/ASTERIXDB-1736>
> ---------- Forwarded message ----------
> From: Taewoo Kim (JIRA) <jira@apache.org>
> Date: Fri, Nov 18, 2016 at 5:06 PM
> Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
> Join are not being used.
> To: notifications@asterixdb.incubator.apache.org
>
>
> Taewoo Kim created ASTERIXDB-1736:
> -------------------------------------
>
>               Summary: Grace Hash Join and Hybrid Hash Join are not being
> used.
>                   Key: ASTERIXDB-1736
>                   URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
>               Project: Apache AsterixDB
>            Issue Type: Improvement
>              Reporter: Taewoo Kim
>              Assignee: Taewoo Kim
>
>
> As the title says, Grace Hash Join and Hybrid Hash Join are not being used.
> I suggest that we remove these two join methods. Here are my findings for
> these two joins.
>
> 1) Grace Hash Join
> GraceHashJoinOperatorDescriptor is only called from two places:
> org.apache.hyracks.examples.tpch.client.join and
> TPCHCustomerOrderHashJoinTest.
> One is a Hyracks example (tpch.client) and the other is a unit test. This
> join is not used currently (not chosen during the compilation).
>
> 2) Hybrid Hash Join
> During the compilation, the optimizer decides whether it will use Hybrid
> Hash Join or Optimized Hybrid Hash Join.
> If the hash function family for each key variable is set, then we use the
> optimized hybrid hash join.
> If not, we use the hybrid hash join. However, in fact, this path - hybrid
> hash join path will never be chosen. Let's check the code.
>
> {code:title=HybridHashJoinPOperator.java|borderStyle=solid}
>          IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.
> variablesToBinaryHashFunctionFamilies(keysLeftBranch,
>                  env, context);
>
>          ...
>
>          boolean optimizedHashJoin = true;
>          for (IBinaryHashFunctionFamily family : hashFunFamilies) {
>              if (family == null) {
>                  optimizedHashJoin = false;
>                  break;
>              }
>          }
>
>          if (optimizedHashJoin) {
>              opDesc = generateOptimizedHashJoinRuntime(context,
> inputSchemas, keysLeft, keysRight, hashFunFamilies,
>                      comparatorFactories, predEvaluatorFactory,
> recDescriptor, spec);
>          } else {
>              opDesc = generateHashJoinRuntime(context, inputSchemas,
> keysLeft, keysRight, hashFunFactories,
>                      comparatorFactories, predEvaluatorFactory,
> recDescriptor, spec);
>          }
> {code}
>
> As we can see, optimizedHashJoin is set to false only when the hash family
> is null.
> Then, how do we assign the hashfamily for each key variable?
>
> {code:title=JobGenHelper.java|borderStyle=solid}
>      public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionF
> amilies(
>              Collection<LogicalVariable> varLogical,
> IVariableTypeEnvironment env, JobGenContext context)
>                      throws AlgebricksException {
>          IBinaryHashFunctionFamily[] funFamilies = new
> IBinaryHashFunctionFamily[varLogical.size()];
>          int i = 0;
>          IBinaryHashFunctionFamilyProvider bhffProvider = context.
> getBinaryHashFunctionFamilyProvider();
>          for (LogicalVariable var : varLogical) {
>              Object type = env.getVarType(var);
>              funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(
> type);
>          }
>          return funFamilies;
>      }
> {code}
>
> For each variable type, we try to get hash function family. In the current
> codebase, AqlBinaryHashFunctionFamilyProvider is the only class that
> implements IBinaryHashFunctionFamilyProvider.
> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
> So, there is no way that the hash function family is null.
>
> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
> public class AqlBinaryHashFunctionFamilyProvider implements
> IBinaryHashFunctionFamilyProvider, Serializable {
>
>      private static final long serialVersionUID = 1L;
>      public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new
> AqlBinaryHashFunctionFamilyProvider();
>
>      private AqlBinaryHashFunctionFamilyProvider() {
>
>      }
>
>      @Override
>      public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object
> type) throws AlgebricksException {
>          // AMurmurHash3BinaryHashFunctionFamily converts numeric type to
> double type before doing hash()
>          return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
>      }
>
> }
> {code}
>
>
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message