hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Reed <br...@yahoo-inc.com>
Subject Making the elephant run
Date Tue, 23 May 2006 23:00:49 GMT
Actually, these patches are really just to make Hadoop start  
trotting. It is still at least an order of magnitude slower than it  
should be, but I think these patches are a good start.

I've created two patches for clarity. They are not independent, but  
could easily be made so.

The disk-zoom patch is a performance trifecta: less disk IO, less  
disk space, less CPU, and overall a tremendous improvement. The patch  
is based on the following observation: every piece of data from a map  
hits the disk once on the mapper, and 3 (+plus sorting) times on the  
reducer. Further, the entire input for the reduce step is sorted  
together maximizing the sort time. This patch causes:

1)  the mapper to sort the relatively small fragments at the mapper  
which causes two hits to the disk, but they are smaller files.
2) the reducer copies the map output and may merge (if more than 100  
outputs are present) with a couple of other outputs at copy time. No  
sorting is done since the map outputs are sorted.
3) the reducer  will merge the map outputs on the fly in memory at  
reduce time.

I'm attaching the performance graph (with just the disk-zoom patch)  
to show the results. This benchmark uses a random input and null  
output to remove any DFS performance influences. The cluster of 49  
machines I was running on had limited disk space, so I was only able  
to run to a certain size on unmodified Hadoop. With the patch we use  
1/3 the amount of disk space.

The second patch allows the task tracker to reuse processes to avoid  
the over-head of starting the JVM. While JVM startup is relatively  
fast, restarting a Task causes disk IO and DFS operations that have a  
negative impact on the rest of the system. When a Task finishes,  
rather than exiting, it reads the next task to run from stdin. We  
still isolate the Task runtime from TaskTracker, but we only pay the  
startup penalty once.

This second patch also fixes two performance issues not related to  
JVM reuse. (The reuse just makes the problems glaring.) First, the  
JobTracker counts all jobs not just the running jobs to decide the  
load on a tracker. Second, the TaskTracker should really ask for a  
new Task as soon as one finishes rather than wait the 10 secs.

I've been benchmarking the code alot, but I don't have access to a  
really good cluster to try the code out on, so please treat it as  
experimental. I would love to feedback.

There is another obvious thing to change: ReduceTasks should start  
after the first batch of MapTasks complete, so that 1) they have  
something to do, and 2) they are running on the fastest machines.


  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message