hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Reed <br...@yahoo-inc.com>
Subject Re: Making the elephant run
Date Wed, 24 May 2006 01:01:47 GMT
Sorry, I forgot that attachments don't work for me on the list...  
I've opened an issue and put the attachments there. Issue #249  
Improving Map -> Reduce performance..

thanx
ben

On May 23, 2006, at 4:00 PM, Ben Reed wrote:

> Actually, these patches are really just to make Hadoop start  
> trotting. It is still at least an order of magnitude slower than it  
> should be, but I think these patches are a good start.
>
> I've created two patches for clarity. They are not independent, but  
> could easily be made so.
>
> The disk-zoom patch is a performance trifecta: less disk IO, less  
> disk space, less CPU, and overall a tremendous improvement. The  
> patch is based on the following observation: every piece of data  
> from a map hits the disk once on the mapper, and 3 (+plus sorting)  
> times on the reducer. Further, the entire input for the reduce step  
> is sorted together maximizing the sort time. This patch causes:
>
> 1)  the mapper to sort the relatively small fragments at the mapper  
> which causes two hits to the disk, but they are smaller files.
> 2) the reducer copies the map output and may merge (if more than  
> 100 outputs are present) with a couple of other outputs at copy  
> time. No sorting is done since the map outputs are sorted.
> 3) the reducer  will merge the map outputs on the fly in memory at  
> reduce time.
>
> I'm attaching the performance graph (with just the disk-zoom patch)  
> to show the results. This benchmark uses a random input and null  
> output to remove any DFS performance influences. The cluster of 49  
> machines I was running on had limited disk space, so I was only  
> able to run to a certain size on unmodified Hadoop. With the patch  
> we use 1/3 the amount of disk space.
>
> The second patch allows the task tracker to reuse processes to  
> avoid the over-head of starting the JVM. While JVM startup is  
> relatively fast, restarting a Task causes disk IO and DFS  
> operations that have a negative impact on the rest of the system.  
> When a Task finishes, rather than exiting, it reads the next task  
> to run from stdin. We still isolate the Task runtime from  
> TaskTracker, but we only pay the startup penalty once.
>
> This second patch also fixes two performance issues not related to  
> JVM reuse. (The reuse just makes the problems glaring.) First, the  
> JobTracker counts all jobs not just the running jobs to decide the  
> load on a tracker. Second, the TaskTracker should really ask for a  
> new Task as soon as one finishes rather than wait the 10 secs.
>
> I've been benchmarking the code alot, but I don't have access to a  
> really good cluster to try the code out on, so please treat it as  
> experimental. I would love to feedback.
>
> There is another obvious thing to change: ReduceTasks should start  
> after the first batch of MapTasks complete, so that 1) they have  
> something to do, and 2) they are running on the fastest machines.
>
> thanx
> ben
>
>


Mime
View raw message