flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ashutosh(오픈플랫폼개발팀) <sharma.ashut...@kt.com>
Subject RE: Can HDFSSink write headers as well?
Date Wed, 22 Aug 2012 03:16:53 GMT
Hi All,

I am using the “avro_event” serializer  with writable format as DataStream file type to
store the events into hdfs.
I would like to read the file for further analysis. I am new to avro and don’t have idea;
how to develop the de-serializer to read the flume’s events written in hdfs file.

If anyone could share the sample or example, it would be nice to me. Please help….

Thanks & Regards,
Ashutosh Sharma

From: Bhaskar V. Karambelkar [mailto:bhaskarvk@gmail.com]
Sent: Wednesday, August 22, 2012 12:22 AM
To: user@flume.apache.org
Subject: Re: Can HDFSSink write headers as well?

On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <birchall@infoscience.co.jp<mailto:birchall@infoscience.co.jp>>
Hi David,

Currently there is no way to write headers to HDFS using the built-in Flume functionality.

This is not entirely true, the following combination will write headers to HDFS, in an avro_data
file format (binary).

agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
agent.sinks.hdfsBinarySink.serializer =  avro_client
agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable

The serializer used is part of flume distribution viz.

A file thus written can be processed with AVRO mapreduce API found in AVRO distribution.

Also note that simply using DataStream doesn't mean it's a text file, the serializer and hdfs.writeFormat
also decide
whether the file is text or binary.

I've read the entire HDFS sink code and exprimented with it a lot, so if you want more details,
let me know.

If you are writing to text or binary files on HDFS (i.e. you have set hdfs.fileType = DataStream
or CompressedStream in your config), then you can supply your own custom serializer, which
will allow you to write headers to HDFS. You will need to write a serializer that implements

If, on the other hand, you are writing to HDFS SequenceFiles, then unfortunately there is
no way to customize the way that events are serialized, so you cannot write event headers
to HDFS. This is a known issue (FLUME-1100) and I have supplied a patch to fix it.


On 2012/08/21 11:36, David Capwell wrote:
I was wondering if I pass random data to an event's header, can the HDFSSink write it to HDFS?
 I know it can use the headers to split the data into different paths, but what about writing
the data to HDFS itself?

thanks for your time reading this email.

이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을
포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에 포함된 정보의
전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는
것을 엄격히 금지합니다. 만약, 본 메일이 잘못 전송된 경우, 발신인
또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
This E-mail may contain confidential information and/or copyright material. This email is
intended for the use of the addressee only. If you receive this email by mistake, please either
delete it without reproducing, distributing or retaining copies thereof or notify the sender
View raw message