orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Bloom filter hash broken
Date Wed, 07 Sep 2016 15:57:23 GMT
All,
   Dain Sundstrom pointed out to me in personal email that the ORC bloom
filters are currently using the default character encoding. That makes the
bloom filters non-portable between different computers that use different
default encodings. I've filed ORC-101 to address it, but I want to have a
wider discussion. I'd propose that we:

1. create a new WriterVersion for ORC-101.
2. move the bloom filter code from storage-api into ORC.
3. consistently use UTF-8 when creating new bloom filters
4. for ORC files older than ORC-101, test the default encoding instead of
UTF-8

Thoughts?

.. Owen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message