flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: Can HDFSSink write headers as well?
Date Fri, 24 Aug 2012 02:38:50 GMT
Thanks for this, its what im looking for.  Binary Avro should be good to
use, thanks.

On Wed, Aug 22, 2012 at 7:59 PM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

>
>
> On Tue, Aug 21, 2012 at 8:16 PM, ashutosh(오픈플랫폼개발팀) <
> sharma.ashutosh@kt.com> wrote:
>
>>  Hi All,
>>
>>
>>
>> I am using the “avro_event” serializer  with writable format as
>> DataStream file type to store the events into hdfs.
>>
>> I would like to read the file for further analysis. I am new to avro and
>> don’t have idea; how to develop the de-serializer to read the flume’s
>> events written in hdfs file.
>>
>>
>>
>> If anyone could share the sample or example, it would be nice to me.
>> Please help….
>>
>>
>>
>
> Look at this test to see how to read data. But in general you would want
> to create your own serializer specific to your schema. Otherwise it makes
> sense to just use sequence files.
>
>
> http://svn.apache.org/repos/asf/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/serialization/TestFlumeEventAvroEventSerializer.java
>
>
>>  Thanks & Regards,
>>
>> Ashutosh Sharma
>>
>>
>>
>> *From:* Bhaskar V. Karambelkar [mailto:bhaskarvk@gmail.com]
>> *Sent:* Wednesday, August 22, 2012 12:22 AM
>> *To:* user@flume.apache.org
>> *Subject:* Re: Can HDFSSink write headers as well?
>>
>>
>>
>>
>>
>> On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <
>> birchall@infoscience.co.jp> wrote:
>>
>> Hi David,
>>
>> Currently there is no way to write headers to HDFS using the built-in
>> Flume functionality.
>>
>>
>>
>> This is not entirely true, the following combination will write headers
>> to HDFS, in an avro_data file format (binary).
>>
>>
>>
>> agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
>>
>> agent.sinks.hdfsBinarySink.serializer =  avro_client
>>
>> agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable
>>
>>
>>
>> The serializer used is part of flume distribution viz.
>>
>>
>> flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java
>>
>>
>>
>> A file thus written can be processed with AVRO mapreduce API found in
>> AVRO distribution.
>>
>>
>>
>> Also note that simply using DataStream doesn't mean it's a text file, the
>> serializer and hdfs.writeFormat also decide
>>
>> whether the file is text or binary.
>>
>>
>>
>> I've read the entire HDFS sink code and exprimented with it a lot, so if
>> you want more details, let me know.
>>
>>
>>
>>
>>
>>
>> If you are writing to text or binary files on HDFS (i.e. you have set
>> hdfs.fileType = DataStream or CompressedStream in your config), then you
>> can supply your own custom serializer, which will allow you to write
>> headers to HDFS. You will need to write a serializer that implements
>> org.apache.flume.serialization.EventSerializer.
>>
>> If, on the other hand, you are writing to HDFS SequenceFiles, then
>> unfortunately there is no way to customize the way that events are
>> serialized, so you cannot write event headers to HDFS. This is a known
>> issue (FLUME-1100) and I have supplied a patch to fix it.
>>
>> Chris.
>>
>>
>>
>>
>> On 2012/08/21 11:36, David Capwell wrote:
>>
>> I was wondering if I pass random data to an event's header, can the
>> HDFSSink write it to HDFS?  I know it can use the headers to split the data
>> into different paths, but what about writing the data to HDFS itself?
>>
>> thanks for your time reading this email.
>>
>>
>>
>>
>>
>>
>> 이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나
저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에
>> 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포,
복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못
>> 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시
삭제하여 주시기 바랍니다.
>> This E-mail may contain confidential information and/or copyright
>> material. This email is intended for the use of the addressee only. If you
>> receive this email by mistake, please either delete it without reproducing,
>> distributing or retaining copies thereof or notify the sender immediately.
>>
>
>

Mime
View raw message