hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-249) Improving Map -> Reduce performance and Task JVM reuse
Date Sun, 14 Sep 2008 14:41:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated HADOOP-249:
-------------------------------

    Attachment: 249-with-jvmID.patch

Here is a patch that addresses the task logs issue. In short the patch has the following:
1) Addition of queues in the tasktracker for maps and reduces where a new task is queued up.
When a slot becomes free a task is pulled from one of the relevant queues and initialized/launched

2) When a tasktracker reports with tasks in COMMIT_PENDING status, it could be given a new
task. This new task would be queued up at the tasktracker and launched only when the task
doing the commit finishes. 
3) JvmManager is the class that keeps track of running JVMs and tasks running on the JVMs.
When a JVM exits the appropriate task (if any was being run by the JVM) is failed. 
4) New task state has been added for signifying INITIALIZED and ready to run. Till a task
is initialized (like task locallization, distributed cache init, etc.,) a JVM is not given
that. 
5) The log handling is done this way:
   - I have a small indirection file, which I call index.log, in each attempt dir. Let's say
that the JVM for attempt_1_1
     also runs attempt_foo_bar. The index.log file in the directory attempt_foo_bar would
look like: 
         LOG_DIR: attempt_1_1
         STDOUT: <start of stdout for attempt_foo_bar in the stdout file in attempt_1_1
directory> <length>
         SYSLOG: <similar to above>
         STDERR: <similar to above>
   - I create this log.index file at task startup, and have a thread to continously update
the lengths of the log files
     so that one can see the logs of a running task.
   - I modified tools like TaskLog.Reader to go through this indirection when someone asks
for the log files.      
6) A new class JVMId has been created to create/identify JVM IDs instead of plain integers.
(Arun had requested for this offline)

Have tested large sort with this patch.

The only thing remaining is the handling of the workDir for tasks (that has things like Distributed
Cache symlinks). I am going to address that in the next patch very shortly.

Would really appreciate a review on this one.

> Improving Map -> Reduce performance and Task JVM reuse
> ------------------------------------------------------
>
>                 Key: HADOOP-249
>                 URL: https://issues.apache.org/jira/browse/HADOOP-249
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.3.0
>            Reporter: Benjamin Reed
>            Assignee: Devaraj Das
>         Attachments: 249-3.patch, 249-with-jvmID.patch, 249.1.patch, 249.2.patch, disk_zoom.patch,
image001.png, task_zoom.patch
>
>
> These patches are really just to make Hadoop start trotting. It is still at least an
order of magnitude slower than it should be, but I think these patches are a good start.
> I've created two patches for clarity. They are not independent, but could easily be made
so.
> The disk-zoom patch is a performance trifecta: less disk IO, less disk space, less CPU,
and overall a tremendous improvement. The patch is based on the following observation: every
piece of data from a map hits the disk once on the mapper, and 3 (+plus sorting) times on
the reducer. Further, the entire input for the reduce step is sorted together maximizing the
sort time. This patch causes:
> 1)  the mapper to sort the relatively small fragments at the mapper which causes two
hits to the disk, but they are smaller files.
> 2) the reducer copies the map output and may merge (if more than 100 outputs are present)
with a couple of other outputs at copy time. No sorting is done since the map outputs are
sorted.
> 3) the reducer  will merge the map outputs on the fly in memory at reduce time.
> I'm attaching the performance graph (with just the disk-zoom patch) to show the results.
This benchmark uses a random input and null output to remove any DFS performance influences.
The cluster of 49 machines I was running on had limited disk space, so I was only able to
run to a certain size on unmodified Hadoop. With the patch we use 1/3 the amount of disk space.
> The second patch allows the task tracker to reuse processes to avoid the over-head of
starting the JVM. While JVM startup is relatively fast, restarting a Task causes disk IO and
DFS operations that have a negative impact on the rest of the system. When a Task finishes,
rather than exiting, it reads the next task to run from stdin. We still isolate the Task runtime
from TaskTracker, but we only pay the startup penalty once.
> This second patch also fixes two performance issues not related to JVM reuse. (The reuse
just makes the problems glaring.) First, the JobTracker counts all jobs not just the running
jobs to decide the load on a tracker. Second, the TaskTracker should really ask for a new
Task as soon as one finishes rather than wait the 10 secs.
> I've been benchmarking the code alot, but I don't have access to a really good cluster
to try the code out on, so please treat it as experimental. I would love to feedback.
> There is another obvious thing to change: ReduceTasks should start after the first batch
of MapTasks complete, so that 1) they have something to do, and 2) they are running on the
fastest machines.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message