hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew McNabb (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-950) Job completes but command doesn't return
Date Mon, 29 Jan 2007 21:36:49 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468419
] 

Andrew McNabb commented on HADOOP-950:
--------------------------------------

I'm doing a test right now.  The output of hadoop-streaming is being sent to a pipe, so one
theory for what's happening is that hadoop-streaming blocks when it writes because the output
buffer is full.

If this is the cause, then there's no bug in hadoop-streaming per se, but the problem would
go away if there were an option for quiet output (don't write out the percent complete, etc.).

I'll add another comment to this when I find out for sure.  Thanks.

> Job completes but command doesn't return
> ----------------------------------------
>
>                 Key: HADOOP-950
>                 URL: https://issues.apache.org/jira/browse/HADOOP-950
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.10.1
>         Environment: Linux, Java 1.6.0-b105
>            Reporter: Andrew McNabb
>
> I've had a job submission command hang on many different occasions.  I can't tell exactly
what makes it complete some times and hang others.  Here's some information about one time
when it hanged.
> I started a job at 12:40.  Here is the info from 'ps aux' including the full command
line:
> /ibrix/home/awm27/jdk1.6.0/bin/java -Xmx1000m -Dhadoop.log.dir=/ibrix/home/awm27/hadoop/logs/log.f17880
-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/ibrix/home/awm27/hadoop/bin/.. -Dhadoop.id.str=
-Dhadoop.root.logger=INFO,console -Djava.library.path=/ibrix/home/awm27/hadoop/bin/../lib/native/Linux-amd64-64
-classpath /ibrix/home/awm27/hadoop/dynamic_conf/tmp.FkrUc17883:/ibrix/home/awm27/jdk1.6.0/lib/tools.jar:/ibrix/home/awm27/hadoop/bin/../build/classes:/ibrix/home/awm27/hadoop/bin/../build:/ibrix/home/awm27/hadoop/bin/../build/test/classes:/ibrix/home/awm27/hadoop/bin/..:/ibrix/home/awm27/hadoop/bin/../hadoop-0.10.1-core.jar:/ibrix/home/awm27/hadoop/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/ibrix/home/awm27/hadoop/bin/../lib/commons-codec-1.3.jar:/ibrix/home/awm27/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/ibrix/home/awm27/hadoop/bin/../lib/commons-logging-1.0.4.jar:/ibrix/home/awm27/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/ibrix/home/awm27/hadoop/bin/../lib/jets3t.jar:/ibrix/home/awm27/hadoop/bin/../lib/jetty-5.1.4.jar:/ibrix/home/awm27/hadoop/bin/../lib/junit-3.8.1.jar:/ibrix/home/awm27/hadoop/bin/../lib/log4j-1.2.13.jar:/ibrix/home/awm27/hadoop/bin/../lib/lucene-core-1.9.1.jar:/ibrix/home/awm27/hadoop/bin/../lib/servlet-api.jar:/ibrix/home/awm27/hadoop/bin/../lib/jetty-ext/ant.jar:/ibrix/home/awm27/hadoop/bin/../lib/jetty-ext/commons-el.jar:/ibrix/home/awm27/hadoop/bin/../lib/jetty-ext/jasper-compiler.jar:/ibrix/home/awm27/hadoop/bin/../lib/jetty-ext/jasper-runtime.jar:/ibrix/home/awm27/hadoop/bin/../lib/jetty-ext/jsp-api.jar
org.apache.hadoop.util.RunJar /ibrix/home/awm27/hadoop/build/hadoop-streaming.jar -jobconf
mapred.job.name="MRPSO_RBF_2" -input MRPSO_RBF/MRPSO_RBF_1 -output MRPSO_RBF/MRPSO_RBF_2 -mapper
/ibrix/home/awm27/svn/mrpso/mapper.py -combiner /ibrix/home/awm27/svn/mrpso/reducer.py -reducer
/ibrix/home/awm27/svn/mrpso/reducer.py -cmdenv PYTHONPATH=/ibrix/home/awm27/svn/python -cmdenv
MRPSO_FUNCTION=RBF -cmdenv MRPSO_DATAFILES=/tmp/tmpT9M4cq
> At the time of submission, the jobtracker reported receiving the job and began processing
it.  The first line in this part of the logs is:
> 2007-01-29 12:40:44,072 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached
task tip_0002_m_000002
> At 13:16, the job completed, with the following normal log messages:
> 2007-01-29 13:16:36,115 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0002_r_000001_0'
has completed tip_0002_r_000001 successfully.
> 2007-01-29 13:16:36,117 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0002_r_000001_0'
has completed.
> 2007-01-29 13:16:36,566 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0002_r_000002_0'
has completed tip_0002_r_000002 successfully.
> 2007-01-29 13:16:36,566 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0002_r_000002_0'
has completed.
> 2007-01-29 13:16:36,879 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0002_r_000003_0'
has completed tip_0002_r_000003 successfully.
> 2007-01-29 13:16:36,879 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0002_r_000003_0'
has completed.
> 2007-01-29 13:16:41,808 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0002_r_000000_0'
has completed tip_0002_r_000000 successfully.
> 2007-01-29 13:16:41,930 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0002_r_000000_0'
has completed.
> 2007-01-29 13:16:41,940 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0002 has
completed successfully.
> 2007-01-29 13:16:41,942 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0002_m_000002_0' from 'tracker_m4b-3-2.local:50050'
> ...
> 2007-01-29 13:16:50,660 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0002_m_000016_0' from 'tracker_m4b-3-8.local:50050'
> 2007-01-29 13:16:50,661 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0002_m_000044_0' from 'tracker_m4b-3-8.local:50050'
> 2007-01-29 13:16:50,662 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0002_m_000046_0' from 'tracker_m4b-3-8.local:50050'
> At 14:03, the hadoop-streaming command (process 21261) that started at 12:40 was still
running and using 0.0% of the CPU.  The output of the command ended with:
> 07/01/29 13:16:41 INFO streaming.StreamJob:  map 100%  reduce 93%
> 07/01/29 13:16:42 INFO streaming.StreamJob:  map 100%  reduce 100%
> 07/01/29 13:16:42 INFO streaming.StreamJob: Job complete: job_0002
> It seems odd that hadoop-streaming would hang instead of returning.
> I hope this information is helpful.  Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message