flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gumnaam Sur <gumnaam....@gmail.com>
Subject Re: HDFS Sink writeformat / filetype / serializer
Date Tue, 31 Jul 2012 15:13:42 GMT
To add to the question,

I've setup 4 HDFS sinks as follows

a) seqaeSink ,  serializer = avro_event , fileType = SequenceFile
b) seqtSink ,  serializer = text , fileType = SequenceFile
c) dsaeSink ,  serializer = avro_event , fileType = DataStream
c) dsaeSink ,  serializer = text , fileType = DataStream , writable = text

The problem is seqae, doesn't write AvroEvent object, rather it writes a
Sequence File of
LongWritable,BytesWritable, and this is WRONG. The Sequence File should be
of AvroEvent.

The seqt sink works correctly, as in it writes a sequence File of
LongWritable, BytesWritable.

dsae sink, writes a Data Stream File (each event saperated by new line) of
Avro Events

dst sink writes plane message body to the file, and that's correct too.

So in conclusion the combination
 serializer = avro_event , fileType = SequenceFile
is not working as expected, it works just like the combination   serializer
= text , fileType = SequenceFile


On Tue, Jul 31, 2012 at 10:11 AM, Gumnaam Sur <gumnaam.sur@gmail.com> wrote:

> Hi,
> For HDFS Sink we have 3 properties which determine the type and content
> that gets written to the file.
>
> writeFomrat = text | writabe
> fileType = SequenceFile | DataStream | CompressedStream
> serializer = text | avro_event | <custom>
>
> Can one of the devs, explain these in detail, and the output expected by
> various permutation / combinations of the 3 values. and if any combination
> is
> invalid etc.
>
> e.g. what's the difference between the combo
> serializer = avro_event , fileType = SequenceFile
> and
> serializer = avro_event , fileType = DataStream
>
> , What's the difference between writeFormat = 'text' and writeFormat =
> 'writable' ?
>
> To give some background, I am looking to serialize Avro Events, in HDFS in
> Sequence file,
> and trying to use org.apache.avro.mapreduce.* from my hadoop jobs. I
> figure using SequenceFile
> should give better performance, over text, but I am not exactly sure of
> the various flume options
> I mentioned above.
>
> thanks
>

Mime
View raw message