hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upendra Dadi <ud...@gmu.edu>
Subject problems with Hadoop Streaming
Date Thu, 03 Dec 2009 14:01:19 GMT
Hi,
  I am having some issues with Hadoop Streaming when the size of the value is large. Here
is the code snippet of the Mapper program written in C++:


    std::string outTif;
    generateString64(hSrcDS,outTif);
    std::cout<<url<<'\t'<<outTif<<std::endl;
    return (EXIT_SUCCESS);
}


here outTif strings for all the mapper tasks is of the same size - about 33 MB. When I replace
outTif by outTif.substr(0,20000000), it is completing the job fine. Though it is taking long
time. Obviously it is working 
fine for smaller values. But if I replace it by outTif.substr(0,30000000), I get the following
output:

  09/12/03 00:35:22 INFO streaming.StreamJob: Running job: job_200912021427_0027
09/12/03 00:35:22 INFO streaming.StreamJob: To kill this job, run:
09/12/03 00:35:22 INFO streaming.StreamJob: /home/upendra/hadoop-0.20.1/bin/../bin/hadoop
job  -Dmapred.job.tracker=localhost:9001 -kill job_200912021427_0027
09/12/03 00:35:22 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_200912021427_0027
09/12/03 00:35:23 INFO streaming.StreamJob:  map 0%  reduce 0%
09/12/03 00:35:39 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 00:35:49 INFO streaming.StreamJob:  map 100%  reduce 0%
09/12/03 00:45:58 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 00:46:34 INFO streaming.StreamJob:  map 100%  reduce 0%
09/12/03 00:46:41 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 00:47:13 INFO streaming.StreamJob:  map 100%  reduce 0%
09/12/03 00:57:00 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 00:57:36 INFO streaming.StreamJob:  map 0%  reduce 0%
09/12/03 00:57:52 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 00:57:55 INFO streaming.StreamJob:  map 100%  reduce 0%
09/12/03 01:08:19 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 01:08:47 INFO streaming.StreamJob:  map 0%  reduce 0%
09/12/03 01:08:59 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 01:09:03 INFO streaming.StreamJob:  map 100%  reduce 0%
09/12/03 01:19:15 INFO streaming.StreamJob:  map 50%  reduce 0%
09/12/03 01:19:27 INFO streaming.StreamJob:  map 0%  reduce 0%
09/12/03 01:19:48 INFO streaming.StreamJob:  map 100%  reduce 100%
09/12/03 01:19:48 INFO streaming.StreamJob: To kill this job, run:
09/12/03 01:19:48 INFO streaming.StreamJob: /home/upendra/hadoop-0.20.1/bin/../bin/hadoop
job  -Dmapred.job.tracker=localhost:9001 -kill job_200912021427_0027
09/12/03 01:19:48 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_200912021427_0027
09/12/03 01:19:49 ERROR streaming.StreamJob: Job not Successful!
09/12/03 01:19:49 INFO streaming.StreamJob: killJob...
Streaming Job Failed!


here is the snippet from syslog:


2009-12-03 00:57:48,314 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=MAP, se
ssionId=
2009-12-03 00:57:48,579 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2009-12-03 00:57:50,876 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/tmp/hadoop-upendra/mapred/local/t
askTracker/jobcache/job_200912021427_0027/attempt_200912021427_0027_m_000000_2/work/./gdalloadmap]
2009-12-03 00:57:51,198 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s]
out:NA [rec/s]
2009-12-03 01:08:18,789 WARN org.apache.hadoop.mapred.TaskRunner: Parent died.  Exiting attempt_200912021427_0027_m_0000
00_2
2009-12-03 01:08:42,071 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=CLEANUP
, sessionId=
2009-12-03 01:08:42,931 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task
2009-12-03 01:08:44,144 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_200912021427_0027_m_000000_2
is done. And
 is in the process of commiting
2009-12-03 01:08:44,560 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_200912021427_0027_m_000000_2'
done.
LOG_DIR:attempt_200912021427_0027_m_000000_3


I don't know what is happening when the size of value is increased. There is no ruducer (-D
mapred.reduce.tasks=0) for the job. I am guessing there is some size limit or time limit which
is causing the problem. I am running this job on a single node in a pseudo-distributed mode.
Any guess on what might be going wrong? Also what parameters should I modify to improve the
performance when the size of the values are large. I have very few input key-value pairs,
but sizes of the values are large. Any help is much appreciated. Thank you.

Upendra

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message