hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4730) multi-threaded merge phase
Date Wed, 26 Nov 2008 02:54:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650847#action_12650847
] 

Christian Kunz commented on HADOOP-4730:
----------------------------------------

I was monitoring a long tail (single reducer) of a job, and noticed that it was spending a
lot of time in the merge phase doing merges in single-threaded fashion. I attach the log:

2008-11-25 16:27:52,222 INFO org.apache.hadoop.mapred.ReduceTask: Initiating final on-disk
merge with 394 files
2008-11-25 16:27:52,343 INFO org.apache.hadoop.mapred.Merger: Merging 394 sorted segments
2008-11-25 16:27:57,982 INFO org.apache.hadoop.mapred.Merger: Merging 97 intermediate segments
out of a total of 394
2008-11-25 17:10:23,569 INFO org.apache.hadoop.mapred.Merger: Merging 100 intermediate segments
out of a total of 298
2008-11-25 17:59:22,272 INFO org.apache.hadoop.mapred.Merger: Merging 100 intermediate segments
out of a total of 199
2008-11-25 18:48:48,813 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 100 segments left of total size: 113074719385 bytes
2008-11-25 18:48:50,521 INFO org.apache.hadoop.mapred.pipes.PipesReducer: starting application

Between 16:28 and 18:48 3 merges got executed, each taking 40-50 minutes. With running in
parallel we could have saved about 1.5hr.

> multi-threaded merge phase
> --------------------------
>
>                 Key: HADOOP-4730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4730
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>            Reporter: Christian Kunz
>
> Doing merges in multiple threads (when enough cores are available -- a monitoring issue),
the time spent in merging could be cut by a factor equal to the number of threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message