hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Hadoop Streaming job error - Need help urgent
Date Mon, 22 Apr 2013 19:17:57 GMT
OK, great.  It looks like with the change to "$input_path", you've made
progress.

Now it's actually submitting the job, but something is causing the map
tasks to fail.  Usually, this is some kind of bug in user code, so you'll
need to do some further investigation on your side.  I expect the tracking
URL mentioned in the output above will give you some clues.  That should
also steer you towards the individual task log outputs.

--Chris



On Mon, Apr 22, 2013 at 12:04 PM, prithvi dammalapati <
d.prithvi999@gmail.com> wrote:

> java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
> hadoop_home=/usr/local/hadoop/hadoop-1.0.4
> hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
> hadoop_bin=$hadoop_home/bin/hadoop
> hadoop_config=$hadoop_home/conf
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
> #task specific parameters
> source_code=BetweennessCentrality.java
> jar_file=BetweennessCentrality.jar
> main_class=mslab.BetweennessCentrality
> num_of_node=38012
> num_of_mapper=100
> num_of_reducer=8
> input_path=/data/dblp_author_conf_adj.txt
> output_path=dblp_bc_N$(($num_
> of_node))_M$((num_of_mapper))
> rm build -rf
> mkdir build
> $java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
> rm $jar_file -f
> $java_home/bin/jar -cf $jar_file -C build/ .
> $hadoop_bin --config $hadoop_config fs -rmr $output_path
> $hadoop_bin --config $hadoop_config jar $jar_file $main_class $num_of_node       $num_of_mapper
>
> rm brandes_mapper
>
> g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
> $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D mapred.reduce.tasks=$num_of_reducer
-input input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file brandes_mapper
-file src/mslab/BC_reducer.py -file src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"
>
> After running this code, I get the following error
> 13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
> 13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
> 13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
> 13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
> 13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
> 13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
> 13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
> 13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
> 13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
> 13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
> 13/04/22 12:31:01 INFO streaming.StreamJob: /usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop
job  -Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
> 13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
> 13/04/22 <http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_000213/04/22>
12:31:01 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded
allowed limit. FailedCount: 1. LastFailedTask: task_201304221215_0002_m_000006
> 13/04/22 12:31:01 INFO streaming.StreamJob: killJob...
>
> Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get the same error.
Can someone help me solve this error
>
> Any help is appreciated
>
> Thanks
>
> Prithvi
>
>
>
> On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cnauroth@hortonworks.com>wrote:
>
>> (Moving to user list, hdfs-dev bcc'd.)
>>
>> Hi Prithvi,
>>
>> From a quick scan, it looks to me like one of your commands ends up using
>> "input_path" as a string literal instead of replacing with the value of the
>> input_path variable.  I've pasted the command below.  Notice that one of
>> the -file options used "input_path" instead of "$input_path".
>>
>> Is that the problem?
>>
>> Hope this helps,
>> --Chris
>>
>>
>>
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
>> -file brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>>
>>
>> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
>> d.prithvi999@gmail.com> wrote:
>>
>>> I have the following hadoop code to find the betweenness centrality of a
>>> graph
>>>
>>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>>     hadoop_bin=$hadoop_home/bin/hadoop
>>>     hadoop_config=$hadoop_home/conf
>>>
>>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>>     #task specific parameters
>>>     source_code=BetweennessCentrality.java
>>>     jar_file=BetweennessCentrality.jar
>>>     main_class=mslab.BetweennessCentrality
>>>     num_of_node=38012
>>>     num_of_mapper=100
>>>     num_of_reducer=8
>>>     input_path=/data/dblp_author_conf_adj.txt
>>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>>     rm build -rf
>>>     mkdir build
>>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>>> src/mslab/$source_code
>>>     rm $jar_file -f
>>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>>> $num_of_node       $num_of_mapper
>>>
>>>     rm brandes_mapper
>>>
>>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>>> -D mapred.reduce.tasks=$num_of_reducer -input
>>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>>> brandes_mapper -file src/mslab/BC_reducer.py -file
>>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>>
>>> When I run this code in a shell script, i get the following errors:
>>>
>>>     Warning: $HADOOP_HOME is deprecated.
>>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist,
>>> or is not readable.
>>>     Streaming Command Failed!
>>>
>>> but the file exits at the specified path
>>>
>>>     /Downloads/mgmf/trunk/data$ ls
>>>     dblp_author_conf_adj.txt
>>>
>>> I have also added the input file into HDFS using
>>>
>>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>>
>>> Can someone help me solve this problem?
>>>
>>>
>>> Any help is appreciated,
>>> Thanks
>>> Prithvi
>>>
>>
>>
>

Mime
View raw message