asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Fwd: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.
Date Sat, 19 Nov 2016 20:01:15 GMT
Hi all,

Please share your thought on this issue. In short, Grace Hash Join and
Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash
Join. Therefore, I think it would be better to remove them.
https://issues.apache.org/jira/browse/ASTERIXDB-1736
<https://issues.apache.org/jira/browse/ASTERIXDB-1736>
---------- Forwarded message ----------
From: Taewoo Kim (JIRA) <jira@apache.org>
Date: Fri, Nov 18, 2016 at 5:06 PM
Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
Join are not being used.
To: notifications@asterixdb.incubator.apache.org


Taewoo Kim created ASTERIXDB-1736:
-------------------------------------

             Summary: Grace Hash Join and Hybrid Hash Join are not being
used.
                 Key: ASTERIXDB-1736
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
             Project: Apache AsterixDB
          Issue Type: Improvement
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


As the title says, Grace Hash Join and Hybrid Hash Join are not being used.
I suggest that we remove these two join methods. Here are my findings for
these two joins.

1) Grace Hash Join
GraceHashJoinOperatorDescriptor is only called from two places:
org.apache.hyracks.examples.tpch.client.join and
TPCHCustomerOrderHashJoinTest.
One is a Hyracks example (tpch.client) and the other is a unit test. This
join is not used currently (not chosen during the compilation).

2) Hybrid Hash Join
During the compilation, the optimizer decides whether it will use Hybrid
Hash Join or Optimized Hybrid Hash Join.
If the hash function family for each key variable is set, then we use the
optimized hybrid hash join.
If not, we use the hybrid hash join. However, in fact, this path - hybrid
hash join path will never be chosen. Let's check the code.

{code:title=HybridHashJoinPOperator.java|borderStyle=solid}
        IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.
variablesToBinaryHashFunctionFamilies(keysLeftBranch,
                env, context);

        ...

        boolean optimizedHashJoin = true;
        for (IBinaryHashFunctionFamily family : hashFunFamilies) {
            if (family == null) {
                optimizedHashJoin = false;
                break;
            }
        }

        if (optimizedHashJoin) {
            opDesc = generateOptimizedHashJoinRuntime(context,
inputSchemas, keysLeft, keysRight, hashFunFamilies,
                    comparatorFactories, predEvaluatorFactory,
recDescriptor, spec);
        } else {
            opDesc = generateHashJoinRuntime(context, inputSchemas,
keysLeft, keysRight, hashFunFactories,
                    comparatorFactories, predEvaluatorFactory,
recDescriptor, spec);
        }
{code}

As we can see, optimizedHashJoin is set to false only when the hash family
is null.
Then, how do we assign the hashfamily for each key variable?

{code:title=JobGenHelper.java|borderStyle=solid}
    public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionF
amilies(
            Collection<LogicalVariable> varLogical,
IVariableTypeEnvironment env, JobGenContext context)
                    throws AlgebricksException {
        IBinaryHashFunctionFamily[] funFamilies = new
IBinaryHashFunctionFamily[varLogical.size()];
        int i = 0;
        IBinaryHashFunctionFamilyProvider bhffProvider = context.
getBinaryHashFunctionFamilyProvider();
        for (LogicalVariable var : varLogical) {
            Object type = env.getVarType(var);
            funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(
type);
        }
        return funFamilies;
    }
{code}

For each variable type, we try to get hash function family. In the current
codebase, AqlBinaryHashFunctionFamilyProvider is the only class that
implements IBinaryHashFunctionFamilyProvider.
And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
So, there is no way that the hash function family is null.

{code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
public class AqlBinaryHashFunctionFamilyProvider implements
IBinaryHashFunctionFamilyProvider, Serializable {

    private static final long serialVersionUID = 1L;
    public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new
AqlBinaryHashFunctionFamilyProvider();

    private AqlBinaryHashFunctionFamilyProvider() {

    }

    @Override
    public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object
type) throws AlgebricksException {
        // AMurmurHash3BinaryHashFunctionFamily converts numeric type to
double type before doing hash()
        return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
    }

}
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message