flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gonzalo Herreros <gherre...@gmail.com>
Subject Re: Re: How to customize the key in a HDFS SequenceFile sink
Date Tue, 08 Sep 2015 13:34:51 GMT
Looking at the code, I guess this sink is a bit different and the
"serializer" property doesn't seem to be used.

I see two options:
Either configure hdfs.writeFormat with an implementation of
SequenceFileSerializerType so it uses your own implementation of
SequenceFileSerializer.

Or extend HDFSEventSink, pass in the constructor an extension
of HDFSWriterFactory that when asked for a SequenceWriter return an
extension of HDFSSequenceFile
on which you have overridden the method "append" to build the key whichever
you want.


Regards,
Gonzalo

On 8 September 2015 at 13:14, <Thomas.Beer@continental-corporation.com>
wrote:

>
>
>
> Von:        Gonzalo Herreros <gherreros@gmail.com>
> An:        user@flume.apache.org,
> Datum:        08.09.2015 09:29
> Betreff:        Re: How to customize the key in a HDFS SequenceFile sink
> ------------------------------
>
> Thanks for your prompt reply. May I ask you to give me some more details.
> I'm a little confused as I've read that the "hdfs.serializer" parameter is
> ignored when using sequence files.
> Does it mean that my custom serializer is responsible for writing
> "correct" SequenceFiles (e.g. using "createWriter" of
> org.apache.hadoop.io.SequenceFile)?
>
> I assume that I have to do the following (see pseudocode below):
>
> 1)
> agent configuration:
> hdfs.fileType = DataStream
> hdfs.serializer = MyBuilder
>
>
> 2)
> public class MySerializer implements EventSerializer {
>   customize the key and writing to the outputStream using the createWriter
> method
> }
>
> 3)
> public static class MyBuilder implements EventSerializer.Builder {
>   return new MySerializer(context, os)
> }
>
> Thanks a lot for your support.
>
>
> I would implement a custom serializer and configure it in the standard
> Hdfs sink.
> That way you control how you build the key for each event.
>
> Regards,
> Gonzalo
>
> On 8 September 2015 at 06:42, <*Thomas.Beer@continental-corporation.com*
> <Thomas.Beer@continental-corporation.com>> wrote:
>
> Hello,
>
> I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm
> looking for a possibility to create "custom keys". Per default, Flume is
> using the Timestamp as key within a SequenceFile. However, in my usecase I
> would like to use a customized string as key (instead of the timestamp).
>
> What are best practices for implementing/configuring such a "custom key"
> within Flume?
>
> Best, Thomas
>
>
>

Mime
View raw message