impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Apple <jbap...@cloudera.com>
Subject Re: how add lzma compression to Parquet in Impala
Date Tue, 13 Jun 2017 15:26:24 GMT
If I am reading that discussion correctly, then there is public domain
lzma code that can help you do what you would like to do. THanks for
looking into this!

On Mon, Jun 12, 2017 at 4:22 PM, 孙清孟 <sqm2050@gmail.com> wrote:
> Hi Jim and Tim:
> Thanks for your reply.
> I know APL and GPL, here is some discusses about Hadoop supports for lzma:
> https://issues.apache.org/jira/browse/HADOOP-6837
> <https://issues.apache.org/jira/browse/HADOOP-6837.>
>
> 2017-06-12 23:40 GMT+08:00 Jim Apple <jbapple@cloudera.com>:
>
>> Because Impala is part of the ASF, it cannot contain any GPL code.
>>
>> https://www.apache.org/legal/resolved.html
>>
>> "However, if the component is only needed for optional features, a
>> project can provide the user with instructions on how to obtain and
>> install the non-included work. Optional means that the component is
>> not required for standard use of the product or for the product to
>> achieve a desirable level of quality. The question to ask yourself in
>> this situation is: 'Will the majority of users want to use my product
>> without adding the optional components?'"
>>
>> As I understand it, this is the rule by which Impala can use
>> https://github.com/cloudera/impala-lzo
>>
>> On Mon, Jun 12, 2017 at 8:30 AM, Tim Armstrong <tarmstrong@cloudera.com>
>> wrote:
>> > You would need to add a new codec to the Impala source tree. The codecs
>> are
>> > implemented in be/src/util/codec.h,  be/src/util/compress.h  and
>> > be/src/util/decompress.h. There are a few other places you may need to
>> > change. I would just "git grep -i gzip" to see how the gzip codec is
>> > implemented.
>> >
>> > For compressed text files you would also need to add support to the
>> > frontend, e.g. in
>> > fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
>> >
>> > I'm also not sure if there are any licensing issues here since the XZ
>> > library is GPL licensed.
>> >
>> > On Sat, Jun 10, 2017 at 5:41 PM, 孙清孟 <sqm2050@gmail.com> wrote:
>> >
>> >> I have added lzma codec (hadoop-xz) to parquet(modify the parquet-format
>> >> and parquet-mr)  for hive, and get a higher compression ratio.
>> >>
>> >> But how add a new codec for Impala?
>> >>
>>

Mime
View raw message