hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Hadoop Job Performance as MapReduce count increases?
Date Mon, 09 Nov 2009 04:33:55 GMT
On Sat, Nov 7, 2009 at 11:01 AM, Rob Stewart <robstewart57@googlemail.com>wrote:

> Hi, briefly, I'm writing my dissertation on Distributed computation, and
> then in detail at the various interfaces atop of Hadoop, including Pig,
> Hive, JAQL etc...
> One thing I have noticed in early testing is that Pig tends to generate
> more Map tasks for a given query, than other interfaces for identical query
> design.
> So my question to you MapReduce folks is this:
> ------------
>  If there are 100 Map jobs, spread across 10 DataNodes, and one DataNode
> fails, then approximately 10 Map jobs will be redistributed over the
> remaining 9 DataNodes. If, however, there were 500 Map jobs over the 10
> DataNodes, one of them fails, then 50 Map jobs will be reallocated to the
> remaining 9 DataNodes. Am I to expect a difference in overal performance in
> both of these scenario's?

That depends entirely on how heavy the workloads are in these various jobs.
By the way; the term-of-art in Hadoop is "map task" -- a single MapReduce
"job" contains a set of map tasks and a set of reduce tasks, each of which
may be executed multiple times (e.g., in the event of node failure); such
re-executions of a given task are known as task attempts.

At any rate, it is likely that on a small number of nodes (e.g., 10) a
higher number of more-granular tasks will result in better overall
performance. If the 500 tasks were doing the same work as the 10 tasks in
the original case, then were a node to fail, 50 tasks would need to be
redistributed. This can happen in several ways depending on which nodes have
free resources; likely all 9 healthy nodes will share in the work. Whereas
if there was a single task on the failed node, then only one other machine
could pick that up.

On the other hand, there is a cost associated with the setup/teardown of
tasks as well as merging their results for reducers; breaking up work into
more tasks is good to a point, but going from 10 to 500 is likely to slow
down the overall result in the average no-failure case. I think most folks
strive for average task size to be anywhere from 128 MB to 2048 MB of
uncompressed data. A 50x task granularity improvement would push average
task running times well below the threshold of usability.

- Aaron

> -----------
> The reason for wanting to know this is to perhaps discuss in more detail as
> to whether, in a situation where many faults on the cluster occur, an Hadoop
> job with many Map/Reduce tasks will handle the unreliability better than an
> Hadoop job that has much fewer Map/Reduce tasks?

Depends what you mean by "handle the unreliability." The MapReduce platform
will get your job done in either case. There is no inherent resilience or
weakness to failure based on number of tasks. As stated above, more granular
tasks may result in more even redistribution of additional work.

> If this were the case, is it true to state that, if the reliability of an
> Hadoop cluster (network reliability, DataNode reliability etc...) were known
> before a job was sent to the cluster, the user submitting the job would want
> to adjust the number of Map/Reduce tasks dependant on the reliability?
Not likely. Other parameters, such as the acceptable number of attempt
failures per task, various timeout values, etc, are considerably better
tuning parameters given a baseline reliability profile for a cluster.

Any given node in a cluster is likely to be less reliable as the size of the
cluster grows. e.g., if you have 200 nodes, that's likely to result in more
frequent node failures than a cluster of 10 nodes. And job size will grow
with cluster size (you don't buy 200 nodes unless you have a lot of work to
do). So a job running "at scale" of 50, 100, or 200 nodes will often see
anywhere from 500 to 10,000 map tasks anyway. This already represents plenty
of opportunity for granular work reassignment; switching a given job from
3,000 to 6,000 map tasks is unlikely to improve the overall running time
very much. Since individual tasks in any multi-thousand-task job will likely
have a high degree of variance in runtime, the efficiency of the jobtracker
of scheduling work won't necessarily be improved by having that many more
tasks anyway.

- Aaron

> I may be well off course with this idea, and if this is the case, do let me
> know!
> thanks,
> Rob Stewart

View raw message