flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Flume throughput correlation with RAM
Date Wed, 10 Oct 2012 15:54:35 GMT
How big are your events? Average about 400 bytes?

Brock

On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
<jagadish.bihani@pubmatic.com> wrote:
> Hi
>
> Thanks for the inputs Brock. After doing several experiments
> eventually problem boiled down to disks.
>
>  -- But I had used the same configuration (so all software components are
> same in all 3 machines)
> on all 3 machines.
> -- In User guide it is written that if multiple file channel instances are
> active on the same agent then
> different disks are preferable. But in my case only one file channel is
> active per agent.
> -- Only one pattern I observed that on the machines where I got better
> performance have multiple disks.
> But I don't understand how that will help if I have only 1 active file
> channel.
> -- What is the impact of the type of disk/disk device driver on performance?
> I mean I don't understand
> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>
> Could you please elaborate on File channel and disks correlation.
>
> Regards,
> Jagadish
>
>
> On 10/09/2012 08:01 PM, Brock Noland wrote:
>
> Hi,
>
> Using file channel, in terms of performance, the number and type of
> disks is going to be much more predictive of performance than CPU or
> RAM. Note that consumer level drives/controllers will give you much
> "better" performance because they lie to you about when your data is
> actually written to the drive. If you search for "fsync lies" you'll
> find more information on this.
>
> You probably want to increase the batch size to get better performance.
>
> Brock
>
> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
> <jagadish.bihani@pubmatic.com> wrote:
>
> Hi
>
> My flume setup is:
>
> Source Agent : cat source - File Channel - Avro Sink
> Dest Agent :     avro source - File Channel - HDFS Sink.
>
> There is only 1 source agent and 1 destination agent.
>
> I measure throughput as amount of data written to HDFS per second.
> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec
> the
> throughput is : -- 2 MB/sec ).
>
> I have run source agent on various machines with different hardware
> configurations :
> (In all cases I run flume agent with JAVA OPTIONS as
> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
> -XX:MaxDirectMemorySize=2g")
>
> JDK is 32 bit.
>
> Experiment 1:
> =====
> RAM : 16 GB
> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
> 64 bit Processor with 64 bit Kernel.
> Throughput: 2 MB/sec
>
> Experiment 2:
> ======
> RAM : 4 GB
> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 30 KB/sec
>
> Experiment 3:
> ======
> RAM : 8 GB
> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 80 KB/sec
>
>  -- So as can be seen there is huge difference in the throughput with same
> configuration but
> different hardware.
> -- In the first case where throughput is more RES is around 160 MB in other
> cases it is in
> the range of 40 MB - 50 MB.
>
> Can anybody please give insights that why there is this huge difference in
> the throughput?
> What is the correlation between RAM and filechannel/HDFS sink performance
> and also
> with 32-bit/64 bit kernel?
>
> Regards,
> Jagadish
>
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message