hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Balija <balijamahesh....@gmail.com>
Subject Re: understanding performance
Date Tue, 04 Dec 2012 08:16:14 GMT
Hi Peter,

           Can you also track the the details like, in what nodes your
mappers/reducers are running in each execution.
           As your data might have been replicated across different nodes
each time your job runs JobTracker might schedule your task to run in
different nodes in the cluster.

          This can be one possibility why there are fluctuations in your
job performance.

Mahesh Balija,
Calsoft Labs.
On Mon, Dec 3, 2012 at 8:57 PM, Cogan, Peter (Peter) <
Peter.Cogan@alcatel-lucent.com> wrote:

>  Hi there,
> I've been doing some performance testing with hadoop and have been
> experiencing highly variable results which I am trying to understand. I've
> been examining how long it takes to perform a particular MR job, and am
> finding that the time taken varies by a factor of 2 when I repeat the job.
> Note that the data, algorithm, cluster etc is completely the same (and I am
> the only person on the cluster).
> The way I do the test is from a simple shell script that just runs the job
> again and again. I find that the job is as fast as 5 mins, but as slow as
> 10 mins, with everything in between.
> I've examined the output of two log files, where I can see that the
> performance difference is coming from the map and shuffle phases. For a
> sample 'fast' job, the map phases take on average 2 mins 34 secs, whereas
> for a sample 'slow' jobs the phases take on average 4 mins 12 secs.
> Interestingly, if I then look at the counters for random maps (one each
> from the fast and slow jobs) then I find that all counters are pretty much
> equal – *including* CPU time. This suggests that the slowdown comes from
> bottlenecks at disk I/O or network. Since I am the only user on the network
> (it's a dedicated GB switch) and the only one using the disks, I don't
> understand what can be happening. Also, the total data is not that huge –
> the job analyses 21GB with replication 2 spread across 8 disks on 4 nodes.
> The total disk output from the reducers is about 300MB. I'm not sure how to
> investigate further – is there some other diagnostic within hadoop that can
> tell me where the code is waiting (e.g. For network or disk I/O) – or
> perhaps some system tool that can indicate performance hits in specific
> places?
> Thanks for any suggestions
> Peter

View raw message