flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DSuiter RDX <dsui...@rdx.com>
Subject Re: Problem aggregating syslogTCP > avro > HDFS
Date Mon, 07 Oct 2013 16:12:35 GMT
Ok, I just realized that I am missing a 0 on the rollSize, and it is
probably doing exactly what it is supposed to since I told it close the
file at 3 KB not 3 MB...

Sorry everyone!

Thanks!
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Oct 7, 2013 at 12:00 PM, DSuiter RDX <dsuiter@rdx.com> wrote:

> Hi, this may be a problem with our understanding, or my configuration.
>
> I am trying to take data from rsyslog via remote forwarding over TCP into
> a syslogTCP source, collect it as an avro sink, connect the avro sink to an
> avro source, and then into an HDFS sink.
>
> Everything is connected and the data is flowing from the remote source
> into HDFS in an avro container, so that is not the problem.
>
> The problem is that it is closing files when they are very small, only KBs
> in size, even though I have the hdfs roll_Interval and rollCount properties
> set to 0. I set the hdfs.rollSize property to 3072 for 3MB. I expected it
> to aggregate the files into larger blocks before closing them. Is this
> happening because of the HDFS directory-building escape sequences forcing
> new directory writes and making new files prematurely?
>
> Here are my agent configs:
>
> syslogTCP Source > Avro Sink (first tier, pretty sure everything is ok
> here but maybe not)
>
> ####RT Listener Agent####
> rtlv1.sources=srclv1
> rtlv1.sinks=snklv1
> rtlv1.channels=chnlv1
>
> #sources
> rtlv1.sources.srclv1.type=syslogtcp
> rtlv1.sources.srclv1.host=192.168.1.2
> rtlv1.sources.srclv1.port=5140
> rtlv1.sources.srclv1.channels=chnlv1
>
> #channels
> rtlv1.channels.chnlv1.type=memory
> rtlv1.channels.chnlv1.capacity=1500
> rtlv1.channels.chnlv1.transactionCapacity=1500
>
> #sinks
> rtlv1.sinks.snklv1.type=avro
> rtlv1.sinks.snklv1.hostname=192.168.1.2
> rtlv1.sinks.snklv1.port=5141
> rtlv1.sinks.snklv1.batch-size=1500
> rtlv1.sinks.snklv1.channel=chnlv1
>
> Avro Source > HDFS (second tier)
>
> ####RT Aggregate Writer Agent####
> rtlv2.sources=srclv2
> rtlv2.sinks=snklv2
> rtlv2.channels=chnlv2
>
> #sources
> rtlv2.sources.srclv2.type=avro
> rtlv2.sources.srclv2.bind=192.168.1.2
> rtlv2.sources.srclv2.port=5141
> rtlv2.sources.srclv2.channels=chnlv2
>
> #channels
> rtlv2.channels.chnlv2.type=memory
> rtlv2.channels.chnlv2.capacity=1500
> rtlv2.channels.chnlv2.transactioncapacity=1500
>
> #sinks
> rtlv2.sinks.snklv2.type=hdfs
> rtlv2.sinks.snklv2.channel=chnlv2
> rtlv2.sinks.snklv2.hdfs.path=/user/flume/avro/%y-%m-%d/%H%M
> rtlv2.sinks.snklv2.hdfs.fileSuffix=.avro
> rtlv2.sinks.snklv2.serializer=avro_event
> rtlv2.sinks.snklv2.hdfs.fileType=DataStream
> rtlv2.sinks.snklv2.hdfs.rollInterval=0
> rtlv2.sinks.snklv2.hdfs.rollSize=3072
> rtlv2.sinks.snklv2.hdfs.batchSize=1500
> rtlv2.sinks.snklv2.hdfs.rollCount=0
> rtlv2.sinks.snklv2.hdfs.round=true
> rtlv2.sinks.snklv2.hdfs.roundValue=10
> rtlv2.sinks.snklv2.hdfs.roundUnit=minute
>
> Thanks!
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>

Mime
View raw message