hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4664) Parallelize job initialization
Date Sat, 15 Nov 2008 04:46:44 GMT
Parallelize job initialization
------------------------------

                 Key: HADOOP-4664
                 URL: https://issues.apache.org/jira/browse/HADOOP-4664
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Matei Zaharia


The job init thread currently initializes one job at a time. However, this is a lengthy and
partly IO-bound process because all of the job's block locations need to be resolved through
the namenode and a map of them needs to be built. It can take tens of seconds. As a result,
the cluster sometimes initializes jobs too slowly for full utilization to be achieved, if
there are many small jobs queued up. It would be better to have a pool of threads that initialize
multiple jobs in parallel. One thing to be careful of, however, is not causing deadlocks or
holding locks for too long in these threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message