hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksei Statkevich <astatkev...@rocketfuel.com>
Subject Re: Why does ORC use Deflater instead of native ZlibCompressor?
Date Fri, 24 Jun 2016 00:26:52 GMT
It might be a good idea. Though, I'm also wondering about about performance
difference between the two. Since they both use native implementations,
theoretically they can be close in performance. Are there any benchmarks
for them?

*Aleksei Statkevich *| Engineering Manager

<http://www.google.com/url?q=http%3A%2F%2Frocketfuel.com%2F&sa=D&sntz=1&usg=AFrqEzfAQ9xih8SV05CiYtvyyIAKLzpX2g>

<https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Frocketfuelinc&sa=D&sntz=1&usg=AFrqEzdmS-VfAbRejUE27Yrsp6UaaAoUdw>

<https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Frocketfuelinc%2F&sa=D&sntz=1&usg=AFrqEzc8zstBb-QJdiYqd7m9Wmmt-UHs7A>

<https://www.google.com/url?q=https%3A%2F%2Fwww.instagram.com%2Frocketfuellife%2F&sa=D&sntz=1&usg=AFrqEzf8veiDVVhTCQnpUnRttXonn6y9-g>

<https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Frocket-fuel-inc-&sa=D&sntz=1&usg=AFrqEzcvsj2bSqJ_SYc8qpQWQJnXXEjvLQ>

<https://www.google.com/url?q=https%3A%2F%2Fwww.glassdoor.com%2FOverview%2FWorking-at-Rocket-Fuel-EI_IE286428.11%2C22.htm&sa=D&sntz=1&usg=AFrqEzf6IUelwlAKdidiiJ3wTFdjnigQVg>

On Thu, Jun 23, 2016 at 5:00 PM, Owen O'Malley <omalley@apache.org> wrote:

> Actually, that should work. I'm a little concerned about the memory copy
> that the Hadoop ZlibCompressor does, but it should be a win. If you want to
> work on it, why don't you create a jira on the orc project? Don't forget
> that you'll need to handle the other options in CompressionCodec.modify.
>
> .. Owen
>
> On Thu, Jun 23, 2016 at 3:59 PM, Aleksei Statkevich <
> astatkevich@rocketfuel.com> wrote:
>
>> Hi Owen,
>>
>> Thanks for the response. I saw that DirectDecompressor will be used if
>> available and the difference was only in compression.
>> Keeping in mind what you said, I looked at the code again. I see that the
>> only specific piece that ORC uses is "nowrap" = true in Deflater. As far as
>> I understand from the description, it should directly correspond
>> to CompressionHeader.NO_HEADER in ZlibCompressor. In this case,
>> ZlibCompressor with the right setup can be a replacement for Deflater. What
>> do you think?
>>
>> Aleksei
>>
>> *Aleksei Statkevich *| Engineering Manager
>>
>>
>> <http://www.google.com/url?q=http%3A%2F%2Frocketfuel.com%2F&sa=D&sntz=1&usg=AFrqEzfAQ9xih8SV05CiYtvyyIAKLzpX2g>
>>
>> <https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Frocketfuelinc&sa=D&sntz=1&usg=AFrqEzdmS-VfAbRejUE27Yrsp6UaaAoUdw>
>>
>> <https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Frocketfuelinc%2F&sa=D&sntz=1&usg=AFrqEzc8zstBb-QJdiYqd7m9Wmmt-UHs7A>
>>
>> <https://www.google.com/url?q=https%3A%2F%2Fwww.instagram.com%2Frocketfuellife%2F&sa=D&sntz=1&usg=AFrqEzf8veiDVVhTCQnpUnRttXonn6y9-g>
>>
>> <https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Frocket-fuel-inc-&sa=D&sntz=1&usg=AFrqEzcvsj2bSqJ_SYc8qpQWQJnXXEjvLQ>
>>
>> <https://www.google.com/url?q=https%3A%2F%2Fwww.glassdoor.com%2FOverview%2FWorking-at-Rocket-Fuel-EI_IE286428.11%2C22.htm&sa=D&sntz=1&usg=AFrqEzf6IUelwlAKdidiiJ3wTFdjnigQVg>
>>
>> On Thu, Jun 23, 2016 at 2:35 PM, Owen O'Malley <omalley@apache.org>
>> wrote:
>>
>>>
>>>
>>> On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich <
>>> astatkevich@rocketfuel.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I recently looked at ORC encoding and noticed
>>>> that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not
>>>> Hadoop's native ZlibCompressor.
>>>>
>>>> Can someone please tell me what is the reason for it?
>>>>
>>>
>>> It is more subtle than that. The first piece to notice is that if your
>>> Hadoop has the direct decompression
>>> (org.apache.hadoop.io.compress.zlib.ZlibDirectDecompressor), it will be
>>> used. The reason that the ZlibCompressor isn't used is because ORC needs a
>>> different API. In particular, ORC doesn't use stream compression, but
>>> rather block compression. That is done so that it can jump over compression
>>> blocks for predicate push down. (If you are skipping over a lot of values,
>>> ORC doesn't need to decompress the bytes.)
>>>
>>> .. Owen
>>>
>>>
>>>
>>>>
>>>> Also, how does performance of Deflater (which also uses native
>>>> implementation) compare to Hadoop's native zlib implementation?
>>>>
>>>> Thanks,
>>>> Aleksei
>>>>
>>>>
>>>
>>
>

Mime
View raw message