hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3604) Reduce stuck at shuffling phase
Date Fri, 27 Jun 2008 10:25:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608710#action_12608710
] 

devaraj edited comment on HADOOP-3604 at 6/27/08 3:24 AM:
--------------------------------------------------------------

Some comments: 
1) In ReduceTask, shouldn't you call mapoutput.discard when there is an exception while reading/writing
the map output.
2) Shouldn't the line {noformat} workmem = (*env)->GetDirectBufferAddress(env, working_memory_buf);
{noformat} in LzoCompressor.java be within a LOCK & UNLOCK 
3) In the codec implementations, you lock on a per class level the call to GetDirectBufferAddress.
I wonder whether it makes more sense to have all the calls to GetDirectBufferAddress synched
on a single global lock. Wouldn't that be the safest thing to do (since we do use both the
compression and decompression classes in the same JVM although maybe at different times always).


      was (Author: devaraj):
    Some comments: 
1) In ReduceTask, shouldn't you call mapoutput.discard when there is an exception while reading/writing
the map output.
2) Shouldn't the line {noformat} workmem = (*env)->GetDirectBufferAddress(env, working_memory_buf);
{noformat} in LzoCompressor.java be within a LOCK & UNLOCK 
3) In the codec implementations, you lock on a per class level the call to GetDirectBufferAddress.
I wonder whether it makes more sense to have all the calls to GetDirectBufferAddress synched
on a single global lock. Wouldn't that be the safest thing to do?

  
> Reduce stuck at shuffling phase
> -------------------------------
>
>                 Key: HADOOP-3604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3604
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Runping Qi
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3604_0_20080623.patch, HADOOP-3604_1_20080624.patch, HADOOP-3604_1_20080624.patch,
HADOOP-3604_2_20080625.patch, stack.txt
>
>
> I was running gridmix with Hadoop 0.18.
> I set the map output compression to true.
> Most of the jobs completed just fine.
> Three jobs, however, got stuck.
> Each has one reducer stuck at shuffling phase.
> Here is the log:
> 2008-06-20 00:06:01,264 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Metrics with processName=SHUFFLE, sessionId=
> 2008-06-20 00:06:01,415 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec
[/bin/cat]
> 2008-06-20 00:06:01,463 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager:
MemoryLimit=134217728, MaxSingleShuffleLimit=33554432
> 2008-06-20 00:06:01,474 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
> 2008-06-20 00:06:01,475 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully
loaded & initialized native-zlib library
> 2008-06-20 00:06:01,476 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,477 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,477 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,478 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,478 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,486 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,486 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,487 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,487 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,488 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,488 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,489 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,489 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,489 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,493 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,496 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,496 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,496 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,497 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,497 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
> 2008-06-20 00:06:01,500 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0
Thread started: Thread for merging on-disk files
> 2008-06-20 00:06:01,500 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0
Thread waiting: Thread for merging on-disk files
> 2008-06-20 00:06:01,502 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0
Need another 270 map output(s) where 0 is already in progress
> 2008-06-20 00:06:01,503 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0
Thread started: Thread for merging in memory files
> 2008-06-20 00:06:01,503 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0:
Got 0 new map-outputs & number of known map outputs is 0
> 2008-06-20 00:06:01,504 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0
Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
> 2008-06-20 00:06:06,654 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0:
Got 269 new map-outputs & number of known map outputs is 269
> 2008-06-20 00:06:06,656 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200806192318_0450_r_000016_0
Scheduled 229 of 269 known outputs (0 slow hosts and 40 dup hosts)
> 2008-06-20 00:06:07,163 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 846183 bytes
(210104 raw bytes) into RAM-FS from attempt_200806192318_0450_m_000089_0
> 2008-06-20 00:06:07,163 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 820890 bytes
(204371 raw bytes) into RAM-FS from attempt_200806192318_0450_m_000083_0
> 2008-06-20 00:06:07,166 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 835672 bytes
(208085 raw bytes) into RAM-FS from attempt_200806192318_0450_m_000122_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message