hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Hadoop Job Performance as MapReduce count increases?
Date Thu, 12 Nov 2009 02:03:31 GMT
That makes sense. It's worth pointing out that tasks are scheduled on a
"pull" basis -- tasktrackers ask for more work if they have free slots for
tasks -- so it is not a given that all nodes will receive the same number of
tasks. If some tasks take considerably longer (or some nodes are
faster/slower than the others), then those nodes may request fewer tasks
from the jobtracker as the job runs -- so in practice you may not get
results as even as your table suggests.

- Aaron

On Wed, Nov 11, 2009 at 8:30 AM, Rob Stewart <robstewart57@googlemail.com>wrote:

>
> Hi Aaron,
>
> your response was very useful indeed, thank you very much.
>
> OK, I've documented the scenario (relevent to my experiments), where the
> cluster is very small, only 10 nodes.
>
> I have uploaded this section only to :
>
> http://linuxsoftwareblog.com/Hadoop/small_cluster_scenario.png
>
> Can I ask, does the paragraph, and the subsequent table make relevant sense
> to you, and does it reflect the true nature of potential performance issues
> with very small MapReduce task count on an unreliable small cluster?
>
>
> thanks,
>
>
> Rob Stewart
>
>
>
>
> 2009/11/9 Aaron Kimball <aaron@cloudera.com>
>
>
>>
>> On Sat, Nov 7, 2009 at 11:01 AM, Rob Stewart <robstewart57@googlemail.com
>> > wrote:
>>
>>> Hi, briefly, I'm writing my dissertation on Distributed computation, and
>>> then in detail at the various interfaces atop of Hadoop, including Pig,
>>> Hive, JAQL etc...
>>> One thing I have noticed in early testing is that Pig tends to generate
>>> more Map tasks for a given query, than other interfaces for identical query
>>> design.
>>>
>>> So my question to you MapReduce folks is this:
>>> ------------
>>>  If there are 100 Map jobs, spread across 10 DataNodes, and one DataNode
>>> fails, then approximately 10 Map jobs will be redistributed over the
>>> remaining 9 DataNodes. If, however, there were 500 Map jobs over the 10
>>> DataNodes, one of them fails, then 50 Map jobs will be reallocated to the
>>> remaining 9 DataNodes. Am I to expect a difference in overal performance in
>>> both of these scenario's?
>>>
>>
>>
>> That depends entirely on how heavy the workloads are in these various
>> jobs. By the way; the term-of-art in Hadoop is "map task" -- a single
>> MapReduce "job" contains a set of map tasks and a set of reduce tasks, each
>> of which may be executed multiple times (e.g., in the event of node
>> failure); such re-executions of a given task are known as task attempts.
>>
>> At any rate, it is likely that on a small number of nodes (e.g., 10) a
>> higher number of more-granular tasks will result in better overall
>> performance. If the 500 tasks were doing the same work as the 10 tasks in
>> the original case, then were a node to fail, 50 tasks would need to be
>> redistributed. This can happen in several ways depending on which nodes have
>> free resources; likely all 9 healthy nodes will share in the work. Whereas
>> if there was a single task on the failed node, then only one other machine
>> could pick that up.
>>
>> On the other hand, there is a cost associated with the setup/teardown of
>> tasks as well as merging their results for reducers; breaking up work into
>> more tasks is good to a point, but going from 10 to 500 is likely to slow
>> down the overall result in the average no-failure case. I think most folks
>> strive for average task size to be anywhere from 128 MB to 2048 MB of
>> uncompressed data. A 50x task granularity improvement would push average
>> task running times well below the threshold of usability.
>>
>> - Aaron
>>
>>
>>
>>> -----------
>>>
>>> The reason for wanting to know this is to perhaps discuss in more detail
>>> as to whether, in a situation where many faults on the cluster occur, an
>>> Hadoop job with many Map/Reduce tasks will handle the unreliability better
>>> than an Hadoop job that has much fewer Map/Reduce tasks?
>>>
>>
>> Depends what you mean by "handle the unreliability." The MapReduce
>> platform will get your job done in either case. There is no inherent
>> resilience or weakness to failure based on number of tasks. As stated above,
>> more granular tasks may result in more even redistribution of additional
>> work.
>>
>>
>>>
>>> If this were the case, is it true to state that, if the reliability of an
>>> Hadoop cluster (network reliability, DataNode reliability etc...) were known
>>> before a job was sent to the cluster, the user submitting the job would want
>>> to adjust the number of Map/Reduce tasks dependant on the reliability?
>>>
>>>
>> Not likely. Other parameters, such as the acceptable number of attempt
>> failures per task, various timeout values, etc, are considerably better
>> tuning parameters given a baseline reliability profile for a cluster.
>>
>> Any given node in a cluster is likely to be less reliable as the size of
>> the cluster grows. e.g., if you have 200 nodes, that's likely to result in
>> more frequent node failures than a cluster of 10 nodes. And job size will
>> grow with cluster size (you don't buy 200 nodes unless you have a lot of
>> work to do). So a job running "at scale" of 50, 100, or 200 nodes will often
>> see anywhere from 500 to 10,000 map tasks anyway. This already represents
>> plenty of opportunity for granular work reassignment; switching a given job
>> from 3,000 to 6,000 map tasks is unlikely to improve the overall running
>> time very much. Since individual tasks in any multi-thousand-task job will
>> likely have a high degree of variance in runtime, the efficiency of the
>> jobtracker of scheduling work won't necessarily be improved by having that
>> many more tasks anyway.
>>
>> - Aaron
>>
>>
>>>
>>> I may be well off course with this idea, and if this is the case, do let
>>> me know!
>>>
>>>
>>> thanks,
>>>
>>>
>>> Rob Stewart
>>>
>>
>>
>

Mime
View raw message