parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ziva...@apache.org
Subject [parquet-format] branch master updated: PARQUET-1609: Specify which xxhash carefully (#143)
Date Fri, 12 Jul 2019 09:26:27 GMT
This is an automated email from the ASF dual-hosted git repository.

zivanfi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 9bdd844  PARQUET-1609: Specify which xxhash carefully (#143)
9bdd844 is described below

commit 9bdd844300dfd2e15c622c8f3edaf89e52005d1e
Author: Jim Apple <github.public@jbapple.com>
AuthorDate: Fri Jul 12 02:26:22 2019 -0700

    PARQUET-1609: Specify which xxhash carefully (#143)
    
    The hash function "xxhash" is actually a number of different hash
    functions including xxHash, XXH64, XXH32, and XXH3. Additionally,
    these hash functions accept "seeds", as most modern hash functions do,
    including MurmurHash variants.
    
    This patch specifies that the BloomFilter hash function default is
    XXH64 with a seed of 0. It omits the confusing note about the ISA and
    different variants of xxHash, since XXH64 is apparently
    architecture-independent.
---
 BloomFilter.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/BloomFilter.md b/BloomFilter.md
index a01481e..e51b412 100644
--- a/BloomFilter.md
+++ b/BloomFilter.md
@@ -108,9 +108,9 @@ void Mask(uint32_t key, uint32_t mask[8]) {
 
 #### Hash Function
 The function used to hash values in the initial implementation is
-[xxHash](https://cyan4973.github.io/xxHash/), using the least-significant 64 bits version
of the
-function on the x86-64 platform. Note that a given variant, such as XXHash64, shall produces
same
-output irrespective of the cpu/os used, though different variants may produce different values.
+[xxHash](https://cyan4973.github.io/xxHash/), using the function XXH64 with a
+seed of 0 and [following the specification version
+0.1.1](https://github.com/Cyan4973/xxHash/blob/v0.7.0/doc/xxhash_spec.md).
 
 #### Build a Bloom filter
 The fact that exactly eight bits are checked during each lookup means that these filters


Mime
View raw message