orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Bloom filter hash broken
Date Wed, 07 Sep 2016 16:34:35 GMT
+1 to 1-3.  On 4, what do you mean by test?  Assume it’s the default encoding and use that?
 Is there a versioning concept in the bloom filters that will make it easy to determine if
this is pre or post ORC-101?

Alan.

> On Sep 7, 2016, at 08:57, Owen O'Malley <omalley@apache.org> wrote:
> 
> All,
>   Dain Sundstrom pointed out to me in personal email that the ORC bloom
> filters are currently using the default character encoding. That makes the
> bloom filters non-portable between different computers that use different
> default encodings. I've filed ORC-101 to address it, but I want to have a
> wider discussion. I'd propose that we:
> 
> 1. create a new WriterVersion for ORC-101.
> 2. move the bloom filter code from storage-api into ORC.
> 3. consistently use UTF-8 when creating new bloom filters
> 4. for ORC files older than ORC-101, test the default encoding instead of
> UTF-8
> 
> Thoughts?
> 
> .. Owen


Mime
View raw message