orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Bloom filter hash broken
Date Wed, 07 Sep 2016 17:06:37 GMT
4 is about when you are using the bloom filter for predicate push down. I'm
saying old files should use the default encoding when checking the bloom
filter. The other option is to always have the predicate push down say
maybe if the file is an old one.

.. Owen

On Wed, Sep 7, 2016 at 9:34 AM, Alan Gates <alanfgates@gmail.com> wrote:

> +1 to 1-3.  On 4, what do you mean by test?  Assume it’s the default
> encoding and use that?  Is there a versioning concept in the bloom filters
> that will make it easy to determine if this is pre or post ORC-101?
>
> Alan.
>
> > On Sep 7, 2016, at 08:57, Owen O'Malley <omalley@apache.org> wrote:
> >
> > All,
> >   Dain Sundstrom pointed out to me in personal email that the ORC bloom
> > filters are currently using the default character encoding. That makes
> the
> > bloom filters non-portable between different computers that use different
> > default encodings. I've filed ORC-101 to address it, but I want to have a
> > wider discussion. I'd propose that we:
> >
> > 1. create a new WriterVersion for ORC-101.
> > 2. move the bloom filter code from storage-api into ORC.
> > 3. consistently use UTF-8 when creating new bloom filters
> > 4. for ORC files older than ORC-101, test the default encoding instead of
> > UTF-8
> >
> > Thoughts?
> >
> > .. Owen
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message