flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit.vijayar...@gmail.com>
Subject Re: HDFS Sink performance
Date Wed, 15 Jul 2015 18:19:37 GMT
Thanks for the reply Hari. Multiple Sinks make sense, but this would also
mean there is lot more files on HDFS. I will try multiple sinks and see how
fast this can go to.
Given that single HDFS stream can do much higher throughput, may be there
is way to have threadpool for SinkRunner-PollingRunner-DefaultSinkProcessor
instead of single thread per sink.

2015-07-15 11:11 GMT-07:00 Hari Shreedharan <hshreedharan@cloudera.com>:

> Hi Lohit,
>
> HDFS sinks (in fact, most sinks) are single-threaded by design. This is
> meant to make writing the sinks easier, but all channels can handle
> multiple sinks reading from them. So to improve the efficiency, you
> basically configure several sinks which read off the same channel. Make
> sure that each sink though writes to files with different HDFS paths or
> different file prefixes (else HDFS client API will complain about leases).
>
>
> Thanks,
> Hari
>
> On Wed, Jul 15, 2015 at 9:10 AM, lohit <lohit.vijayarenu@gmail.com> wrote:
>
>> Hello,
>>
>> Does anyone have some numbers which they can share around HDFS sink
>> performance. From our testing, for single sink writing to HDFS
>> (CompressedStream) and reading from MemoryChannel can only do about 35000
>> events per second (each event is about 1K) in size. After compression this
>> turns out to be ~10MB/s write stream to HDFS file. Which is pretty low. Our
>> configuration looks like this
>>
>> agent.sinks.hdfsSink.type = hdfs
>> agent.sinks.hdfsSink.channel = memoryChannel
>> agent.sinks.hdfsSink.hdfs.path = /tmp/lohit
>> agent.sinks.hdfsSink.hdfs.codeC = lzo
>> agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
>> agent.sinks.hdfsSink.hdfs.writeFormat = Writable
>> agent.sinks.hdfsSink.hdfs.rollInterval = 3600
>> agent.sinks.hdfsSink.hdfs.rollSize = 1073741824
>> agent.sinks.hdfsSink.hdfs.rollCount = 0
>> agent.sinks.hdfsSink.hdfs.batchSize = 10000
>> agent.sinks.hdfsSink.hdfs.txnEventMax = 10000
>>
>> agent.channels.memoryChannel.type = memory
>>
>> agent.channels.memoryChannel.capacity = 3000000
>> agent.channels.memoryChannel.transactionCapacity = 10000
>>
>> --
>> Have a Nice Day!
>> Lohit
>>
>
>


-- 
Have a Nice Day!
Lohit

Mime
View raw message