flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Flume throughput correlation with RAM
Date Tue, 09 Oct 2012 14:31:44 GMT
Hi,

Using file channel, in terms of performance, the number and type of
disks is going to be much more predictive of performance than CPU or
RAM. Note that consumer level drives/controllers will give you much
"better" performance because they lie to you about when your data is
actually written to the drive. If you search for "fsync lies" you'll
find more information on this.

You probably want to increase the batch size to get better performance.

Brock

On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
<jagadish.bihani@pubmatic.com> wrote:
> Hi
>
> My flume setup is:
>
> Source Agent : cat source - File Channel - Avro Sink
> Dest Agent :     avro source - File Channel - HDFS Sink.
>
> There is only 1 source agent and 1 destination agent.
>
> I measure throughput as amount of data written to HDFS per second.
> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec
> the
> throughput is : -- 2 MB/sec ).
>
> I have run source agent on various machines with different hardware
> configurations :
> (In all cases I run flume agent with JAVA OPTIONS as
> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
> -XX:MaxDirectMemorySize=2g")
>
> JDK is 32 bit.
>
> Experiment 1:
> =====
> RAM : 16 GB
> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
> 64 bit Processor with 64 bit Kernel.
> Throughput: 2 MB/sec
>
> Experiment 2:
> ======
> RAM : 4 GB
> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 30 KB/sec
>
> Experiment 3:
> ======
> RAM : 8 GB
> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 80 KB/sec
>
>  -- So as can be seen there is huge difference in the throughput with same
> configuration but
> different hardware.
> -- In the first case where throughput is more RES is around 160 MB in other
> cases it is in
> the range of 40 MB - 50 MB.
>
> Can anybody please give insights that why there is this huge difference in
> the throughput?
> What is the correlation between RAM and filechannel/HDFS sink performance
> and also
> with 32-bit/64 bit kernel?
>
> Regards,
> Jagadish



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message