drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jacques-n <...@git.apache.org>
Subject [GitHub] drill pull request: DRILL-4237 DRILL-4478 fully implement hash to ...
Date Tue, 22 Mar 2016 01:31:37 GMT
Github user jacques-n commented on a diff in the pull request:

    https://github.com/apache/drill/pull/430#discussion_r56925560
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java
---
    @@ -17,47 +17,77 @@
      */
     package org.apache.drill.exec.expr.fn.impl;
     
    +import io.netty.buffer.DrillBuf;
    +import org.apache.drill.common.config.DrillConfig;
    +import org.apache.drill.common.exceptions.DrillConfigurationException;
    +
     import java.nio.ByteBuffer;
     import java.nio.ByteOrder;
     
    -public class HashHelper {
    +public abstract class HashHelper {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(HashHelper.class);
    +  public static final String defaultHashClassName = new String("org.apache.drill.exec.expr.fn.impl.MurmurHash3");
    +  static final String HASH_CLASS_PROP = "drill.exec.hash.class";
     
    +  static String actualHashClassName = defaultHashClassName;
    +  static DrillHash hashCall = new MurmurHash3();
    +  static {
     
    -  /** taken from mahout **/
    -  public static int hash(ByteBuffer buf, int seed) {
    -    // save byte order for later restoration
    -
    -    int m = 0x5bd1e995;
    -    int r = 24;
    +    try {
    +      DrillConfig config = DrillConfig.create();
    +      String configuredClassName = config.getString(HASH_CLASS_PROP);
    +      if(configuredClassName != null && configuredClassName != "") {
    +        actualHashClassName = configuredClassName;
    +        hashCall = config.getInstanceOf(HASH_CLASS_PROP, DrillHash.class);
    +      }
    +      logger.debug("HashHelper initializes with " + actualHashClassName);
    +    }
    +    catch(Exception ex){
    +      logger.error("Could not initialize Hash %s", ex.getMessage());
    +    }
    +  }
     
    -    int h = seed ^ buf.remaining();
    +  public static String getHashClassName(){
    +    return actualHashClassName;
    +  }
     
    -    while (buf.remaining() >= 4) {
    -      int k = buf.getInt();
    +  public static int hash32(int val, long seed) {
    +    double converted = val;
    +    return hash32(converted, seed);
    +  }
    +  public static int hash32(long val, long seed) {
    +    double converted = val;
    +    return hash32(converted, seed);
    +  }
    +  public static int hash32(float val, long seed){
    +    double converted = val;
    +    return hash32(converted, seed);
    +  }
     
    -      k *= m;
    -      k ^= k >>> r;
    -      k *= m;
    +  public static int hash32(double val, long seed){
    +    return hashCall.hash32(val, seed);
    +  }
     
    -      h *= m;
    -      h ^= k;
    -    }
    +  public static  int hash32(int start, int end, DrillBuf buffer, int seed){
    +    return hashCall.hash32(start, end, buffer, seed);
    --- End diff --
    
    Yes, I'm worried about the extra performance hit. I believe we already spend a reasonable
amount of processing time applying hash functions and have considered it an opportunity for
improvement. Give your current construction, we would need to dereference the field everytime
we call the hash function. In the past my analysis of assembly out of the JVM is that this
isn't typically removed. Directly binding to a static function doesn't require this overhead.
Take a look at the jvm bytecode (or assembly) to see the difference. In general, our goal
inside individual functions is to avoid indirection as much as possible, especially with a
hot path such as the hash function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message