hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Holden Robbins (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2990) Ability to thread task execution
Date Mon, 10 Mar 2008 22:48:46 GMT
Ability to thread task execution

                 Key: HADOOP-2990
                 URL: https://issues.apache.org/jira/browse/HADOOP-2990
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
         Environment: All
            Reporter: Holden Robbins

Currently Hadoop spawns a single threaded JVM for each task.  While good for many tasks, this
does not maximize resource usage for slaves that have many cores (machines with more cores
are getting more cost effective everyday) _and_ are running jobs that require many gigabytes
of read-only in-memory resources to maximize throughput.  Running in separate JVMs requires
redundantly loading large amounts of data, reducing the possible number of parallel tasks
that can run per a machine even though more cpus are available.

Adding this ability will give hadoop users the flexibility to balance their need for maximizing
memory usage & throughput and task segmentation.

Note: This is a blocking bug in porting processes over to hadoop for my own organization.
 I am testing a patch for this now that leaves the existing behavior for single threaded operation
in-tact.  All synchronization is done through wrapper classes and helper methods and should
not add any overhead to non-threaded processes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message