flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Sicker <boa...@gmail.com>
Subject Re: Avro to Parquet conversion
Date Wed, 30 Aug 2017 22:22:36 GMT
I implemented something similar to this recently. What you can do is mount
a tmpfs, batch up GenericRecords, write them to a Parquet file in the
tmpfs, then read it back into a byte[] to do with it as you wish.

On 30 August 2017 at 13:17, Mike Percy <mpercy@apache.org> wrote:

> I know that this reply is quite late. I'm not aware of any Flume Parquet
> writer that currently exists. If it was me I would stream it to HDFS in
> Avro format and then use an ETL job (perhaps via Spark or Impala) to
> convert the Avro to Parquet in large batches. Parquet is well suited to
> large batches of records due to its columnar nature.
>
> Mike
>
> On Sun, Jul 16, 2017 at 11:24 PM, Kumar, Ashok 6. (Nokia - IN/Bangalore) <
> ashok.6.kumar@nokia.com> wrote:
>
>> Hi all ,
>>
>>
>>
>> I have avro data coming from kafka and I want to convert it into Parquet
>> using flume. I am not sure how to do it. Can anyone help me out in this.
>>
>>
>>
>> Regards ,
>>
>> Ashok
>>
>
>


-- 
Matt Sicker <boards@gmail.com>

Mime
View raw message