hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2349) speed up list[located]status calls from input formats
Date Thu, 20 Mar 2014 02:37:51 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941329#comment-13941329

Hadoop QA commented on MAPREDUCE-2349:

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 2 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4448//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4448//console

This message is automatically generated.

> speed up list[located]status calls from input formats
> -----------------------------------------------------
>                 Key: MAPREDUCE-2349
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2349
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Joydeep Sen Sarma
>            Assignee: Siddharth Seth
>         Attachments: MAPREDUCE-2349.1.wip.txt, MAPREDUCE-2349.2.txt, MAPREDUCE-2349.3.txt,
MAPREDUCE-2349.4.txt, MAPREDUCE-2349.5.txt
> when a job has many input paths - listStatus - or the improved listLocatedStatus - calls
(invoked from the getSplits() method) can take a long time. Most of the time is spent waiting
for the previous call to complete and then dispatching the next call. 
> This can be greatly speeded up by dispatching multiple calls at once (via executors).
If the same filesystem client is used - then the calls are much better pipelined (since calls
are serialized) and don't impose extra burden on the namenode while at the same time greatly
reducing the latency to the client. In a simple test on non-peak hours, this resulted in the
getSplits() time reducing from about 3s to about 0.5s.

This message was sent by Atlassian JIRA

View raw message