hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Rosenstrauch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
Date Sun, 20 Apr 2014 18:26:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975212#comment-13975212
] 

David Rosenstrauch commented on MAPREDUCE-5402:
-----------------------------------------------

Looks like the patch is working.  Tested it on a few heavy load jobs today, and it definitely
seemed to work around the issue I was having with the "Too many chunks created" error.  I
probably need to tune the parms a bit to optimize the distcp runs I'm doing, but the basic
functionality does seem to work.  (Note:  I did my testing using your patch with the backported
version of distcp at https://github.com/QwertyManiac/hadoop-distcp-mr1, as described in https://issues.cloudera.org/browse/DISTRO-420
, and not the current Hadoop trunk version of the code.)

I'm still seeing a bit of long tail behavior, but it's more like 100 mappers that are taking
longer to complete, rather than 1 or 2, which indicates that the copy is being distributed
more optimally.

Here's some stats recorded from a few of these jobs:

Job 1:
Total number of files:	20,027
Number of files copied:	20,017
Number of files skipped:	10
Number of bytes copied:	84,802,510,328
Number of mappers:	512
Split ratio:	10
Max chunks tolerable:	10,000
Number of dynamic-chunk-files created:	5012
Run time:	5mins, 19sec

Job 2:
Total number of files:	36,374
Number of files copied:	17,160
Number of files skipped:	19,214
Number of bytes copied:	1,196,591,437,407
Number of mappers:	512
Split ratio:	10
Max chunks tolerable:	10,000
Number of dynamic-chunk-files created:	4714
Run time:	50mins, 50sec

Job 2 can obviously be optimized a bit better.  (I.e., it'll distribute much better if I eliminate
all those files being skipped.)  But this is still an improvement - it used to take over 2
hours to run that job.

Thanks again for working on the fix for this issue.  Any additional questions you have please
let me know.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5402
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp, mrv2
>            Reporter: David Rosenstrauch
>            Assignee: Tsuyoshi OZAWA
>         Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author describes
the implementation of DynamicInputFormat, with one of the main motivations cited being to
reduce the chance of long-tails where a few leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long tail using
DistCpV2 and DynamicInputFormat.  And when I tried to alleviate the problem by overriding
the number of mappers and the split ratio used by the DynamicInputFormat, I was prevented
from doing so by the hard-coded limit set in the code by the MAX_CHUNKS_TOLERABLE constant.
 (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a description of my
use case below.)  And although MAPREDUCE-2765 states that this is an "overridable maximum",
when reading through the code there does not actually appear to be any mechanism available
to override it.
> This should be changed.  It should be possible to expand the maximum # of chunks beyond
this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  The job
consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode the number of mappers
for the job from the default of 20 to 128, so as to more properly parallelize the copy across
the cluster.  The number of chunk files created was calculated as 241, and mapred.num.entries.per.chunk
was calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map tasks, which
had each been running for over 2 hours.  The reason for this was that each of the 12 files
that those mappers were copying were quite large (several hundred megabytes in size) and took
~20 minutes each.  However, during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat.  If I were
able to, say, quadruple the number of chunk files created, that would have made each chunk
contain only 3 files, and these large files would have gotten distributed better around the
cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio to, say,
10 - DynamicInputFormat responded with an exception ("Too many chunks created with splitRatio:10,
numMaps:128. Reduce numMaps or decrease split-ratio to proceed.") - presumably because I exceeded
the MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I can't personally
see any.
> If this limit has no particular logic behind it, then it should be overridable - or even
better:  removed altogether.  After all, I'm not sure I see any need for it.  Even if numMaps
* splitRatio resulted in an extraordinarily large number, if the code were modified so that
the number of chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then there
would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario where the product
of numMaps and splitRatio is large, capping the number of chunks at the number of files (numberOfChunks
= numberOfFiles) would result in 1 file per chunk - the maximum parallelization possible.
 That may not be the best-tuned solution for some users, but I would think that it should
be left up to the user to deal with the potential consequence of not having tuned their job
properly.  Certainly that would be better than having an arbitrary hard-coded limit that *prevents*
proper parallelization when dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message