hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Dasgupta <gdsay...@gmail.com>
Subject Re: How to reduce total shuffle time
Date Wed, 29 Aug 2012 04:33:57 GMT

Thanks for your replies. I will try working on recommended suggestions and
provide feedback.


In the JobTracker Web UI -> Job Tracker History, go to the specific job. Go
to Reduce Task List. Enter into the first reduce task attempt. There you
can see the start time. It is the time when the shuffle (part of reduce
phase) actually starts.
Then again, go to JobTracker Main Page -> Job Tracker History -> Same Job.
Click on "Analyse This Job". Scroll down to the portion where you can see
the "Last Shuffle Finish Time".
Calculate the difference/gap between both the times. That is your Job's
Total Shuffle Time.
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 12:57 AM, abhiTowson cal

> hi Gaurav,
> Can you tell me how did calculated total shuffle time ?.Apart from
> combiners and compression, you can also use some shuffle-sort
> parameters that might increase the performance, i am not sure exactly
> which parameters to tweak .Please share if you come across some other
> techniques , i am very much interested.
> Regards
> Abhi
> On Tue, Aug 28, 2012 at 3:16 AM, Gaurav Dasgupta <gdsayshi@gmail.com>
> wrote:
> > Hi,
> >
> > I have run some large and small jobs and calculated the Total Shuffle
> Time
> > for the jobs. I can see that the Total Shuffle Time is almost half the
> Total
> > Time which was taken by the full job to complete.
> >
> > My question, here, is that how can we decrease the Total Shuffle Time?
> And
> > doing so, what will be its effect on the Job?
> >
> > Thanks,
> > Gaurav Dasgupta

View raw message