spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiszk <...@git.apache.org>
Subject [GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Date Fri, 02 Mar 2018 11:10:11 GMT
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19222#discussion_r171821064
  
    --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java
---
    @@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes,
         return fmix(h1, lengthInBytes);
       }
     
    +  public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) {
    +    // This is compatible with original and another implementations.
    +    // Use this method for new components after Spark 2.3.
    +    long offset = base.getBaseOffset();
    +    int lengthInBytes = (int)base.size();
    +    assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative";
    +    int lengthAligned = lengthInBytes - lengthInBytes % 4;
    +    int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), seed);
    +    int k1 = 0;
    +    for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift += 8) {
    +      k1 ^= (base.getByte(offset + i) & 0xFF) << shift;
    +    }
    +    h1 ^= mixK1(k1);
    +    return fmix(h1, lengthInBytes);
    +  }
    +
    +  private static int hashBytesByIntBlock(MemoryBlock base, int seed) {
    --- End diff --
    
    We can do without considering performance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message