hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <>
Subject Re: Why does ORC use Deflater instead of native ZlibCompressor?
Date Thu, 23 Jun 2016 21:35:25 GMT
On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich <> wrote:

> Hello,
> I recently looked at ORC encoding and noticed
> that uses java's and not
> Hadoop's native ZlibCompressor.
> Can someone please tell me what is the reason for it?

It is more subtle than that. The first piece to notice is that if your
Hadoop has the direct decompression
(, it will be
used. The reason that the ZlibCompressor isn't used is because ORC needs a
different API. In particular, ORC doesn't use stream compression, but
rather block compression. That is done so that it can jump over compression
blocks for predicate push down. (If you are skipping over a lot of values,
ORC doesn't need to decompress the bytes.)

.. Owen

> Also, how does performance of Deflater (which also uses native
> implementation) compare to Hadoop's native zlib implementation?
> Thanks,
> Aleksei

View raw message