hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4664) Parallelize job initialization
Date Thu, 12 Mar 2009 12:58:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jothi Padmanabhan updated HADOOP-4664:

    Attachment: hadoop-4664-v3.patch

Patch incorporating review comments.

Test Patch result:
    [exec] +1 overall.  
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]     +1 release audit.  The applied patch does not increase the total number of
release audit warnings.

> Parallelize job initialization
> ------------------------------
>                 Key: HADOOP-4664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4664
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Matei Zaharia
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.0
>         Attachments: hadoop-4664-v1.patch, hadoop-4664-v2.patch, hadoop-4664-v3.patch,
> The job init thread currently initializes one job at a time. However, this is a lengthy
and partly IO-bound process because all of the job's block locations need to be resolved through
the namenode and a map of them needs to be built. It can take tens of seconds. As a result,
the cluster sometimes initializes jobs too slowly for full utilization to be achieved, if
there are many small jobs queued up. It would be better to have a pool of threads that initialize
multiple jobs in parallel. One thing to be careful of, however, is not causing deadlocks or
holding locks for too long in these threads.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message