flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: HDFS Sink writeformat / filetype / serializer
Date Tue, 31 Jul 2012 15:20:35 GMT
Could you report behaviour that you consider as invalid to Flume JIRA (1)?

Also please do not hesitate to submit patches for user guide describing your findings so that
others with the same questions do not have to go through the same exercise.

Jarcec
 
1: https://issues.apache.org/jira/browse/FLUME 

On Tue, Jul 31, 2012 at 11:13:42AM -0400, Gumnaam Sur wrote:
> To add to the question,
> 
> I've setup 4 HDFS sinks as follows
> 
> a) seqaeSink ,  serializer = avro_event , fileType = SequenceFile
> b) seqtSink ,  serializer = text , fileType = SequenceFile
> c) dsaeSink ,  serializer = avro_event , fileType = DataStream
> c) dsaeSink ,  serializer = text , fileType = DataStream , writable = text
> 
> The problem is seqae, doesn't write AvroEvent object, rather it writes a
> Sequence File of
> LongWritable,BytesWritable, and this is WRONG. The Sequence File should be
> of AvroEvent.
> 
> The seqt sink works correctly, as in it writes a sequence File of
> LongWritable, BytesWritable.
> 
> dsae sink, writes a Data Stream File (each event saperated by new line) of
> Avro Events
> 
> dst sink writes plane message body to the file, and that's correct too.
> 
> So in conclusion the combination
>  serializer = avro_event , fileType = SequenceFile
> is not working as expected, it works just like the combination   serializer
> = text , fileType = SequenceFile
> 
> 
> On Tue, Jul 31, 2012 at 10:11 AM, Gumnaam Sur <gumnaam.sur@gmail.com> wrote:
> 
> > Hi,
> > For HDFS Sink we have 3 properties which determine the type and content
> > that gets written to the file.
> >
> > writeFomrat = text | writabe
> > fileType = SequenceFile | DataStream | CompressedStream
> > serializer = text | avro_event | <custom>
> >
> > Can one of the devs, explain these in detail, and the output expected by
> > various permutation / combinations of the 3 values. and if any combination
> > is
> > invalid etc.
> >
> > e.g. what's the difference between the combo
> > serializer = avro_event , fileType = SequenceFile
> > and
> > serializer = avro_event , fileType = DataStream
> >
> > , What's the difference between writeFormat = 'text' and writeFormat =
> > 'writable' ?
> >
> > To give some background, I am looking to serialize Avro Events, in HDFS in
> > Sequence file,
> > and trying to use org.apache.avro.mapreduce.* from my hadoop jobs. I
> > figure using SequenceFile
> > should give better performance, over text, but I am not exactly sure of
> > the various flume options
> > I mentioned above.
> >
> > thanks
> >

Mime
View raw message