hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj V <rajv...@yahoo.com>
Subject Re: TeraSort question.
Date Thu, 13 Jan 2011 16:51:22 GMT

Let me plot the graphs for all the nodes. I picked up 6 random nodes out oif 480 and 2 of
these were really busy and the otehr 4 were idle. Either that makes me very lucky or the cluster
was underutilized.

I would have found it acceptable if different nodes were utilized in different ways, but in
my case , 2 nodes had serious CPU , Network and Disk activity and othersĀ  were completely

From: Steve Loughran <stevel@apache.org>
To: common-user@hadoop.apache.org
Sent: Thursday, January 13, 2011 3:05 AM
Subject: Re: TeraSort question.

On 11/01/11 16:40, Raj V wrote:
> Ted
> Thanks. I have all the graphs I need that include, map reduce timeline, system activity
for all the nodes when the sort was running. I will publish them once I have them in some
presentable format.,
> For legal reasons, I really don't want to send the complete job histiory files.
> My question is still this. When running terasort, would the CPU, disk and network utilization
of all the nodes be more or less similar or completely different.

They can be different. The JT pushes out work to machines when they report in, some may get
more work than others, so generate more local data. This will have follow-on consequences.
In a live system things are different as the work tends to follow the data, so machines with
(or near) the data you need get the work.

It's a really hard thing to say "is the cluster working right", when bringing it up, everyone
is really guessing about expected performance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message