hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prithvi dammalapati <d.prithvi...@gmail.com>
Subject Re: Hadoop Streaming job error - Need help urgent
Date Mon, 22 Apr 2013 19:04:17 GMT
java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
hadoop_home=/usr/local/hadoop/hadoop-1.0.4
hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
hadoop_bin=$hadoop_home/bin/hadoop
hadoop_config=$hadoop_home/conf
hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
#task specific parameters
source_code=BetweennessCentrality.java
jar_file=BetweennessCentrality.jar
main_class=mslab.BetweennessCentrality
num_of_node=38012
num_of_mapper=100
num_of_reducer=8
input_path=/data/dblp_author_conf_adj.txt
output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
rm build -rf
mkdir build
$java_home/bin/javac -d build -classpath .:$hadoop_lib src/mslab/$source_code
rm $jar_file -f
$java_home/bin/jar -cf $jar_file -C build/ .
$hadoop_bin --config $hadoop_config fs -rmr $output_path
$hadoop_bin --config $hadoop_config jar $jar_file $main_class
$num_of_node       $num_of_mapper

rm brandes_mapper

g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
$hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))" -D
mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path
-file brandes_mapper -file src/mslab/BC_reducer.py -file
src/mslab/MapReduceUtil.py -file $input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"

After running this code, I get the following error
13/04/22 12:29:44 INFO streaming.StreamJob:  map 0%  reduce 0%
13/04/22 12:30:01 INFO streaming.StreamJob:  map 20%  reduce 0%
13/04/22 12:30:10 INFO streaming.StreamJob:  map 40%  reduce 0%
13/04/22 12:30:13 INFO streaming.StreamJob:  map 40%  reduce 2%
13/04/22 12:30:16 INFO streaming.StreamJob:  map 40%  reduce 13%
13/04/22 12:30:19 INFO streaming.StreamJob:  map 60%  reduce 13%
13/04/22 12:30:28 INFO streaming.StreamJob:  map 60%  reduce 17%
13/04/22 12:30:31 INFO streaming.StreamJob:  map 60%  reduce 20%
13/04/22 12:31:01 INFO streaming.StreamJob:  map 100%  reduce 100%
13/04/22 12:31:01 INFO streaming.StreamJob: To kill this job, run:
13/04/22 12:31:01 INFO streaming.StreamJob:
/usr/local/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201304221215_0002
13/04/22 12:31:01 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201304221215_0002
13/04/22 12:31:01 ERROR streaming.StreamJob: Job not successful.
Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.
LastFailedTask: task_201304221215_0002_m_000006
13/04/22 12:31:01 INFO streaming.StreamJob: killJob...

Even if i reduce the num_of_nodes to 10 and no_of_mappers to 10 I get
the same error. Can someone help me solve this error

Any help is appreciated

Thanks

Prithvi



On Mon, Apr 22, 2013 at 12:49 PM, Chris Nauroth <cnauroth@hortonworks.com>wrote:

> (Moving to user list, hdfs-dev bcc'd.)
>
> Hi Prithvi,
>
> From a quick scan, it looks to me like one of your commands ends up using
> "input_path" as a string literal instead of replacing with the value of the
> input_path variable.  I've pasted the command below.  Notice that one of
> the -file options used "input_path" instead of "$input_path".
>
> Is that the problem?
>
> Hope this helps,
> --Chris
>
>
>
>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
>
>
> On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
> d.prithvi999@gmail.com> wrote:
>
>> I have the following hadoop code to find the betweenness centrality of a
>> graph
>>
>>     java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
>>     hadoop_home=/usr/local/hadoop/hadoop-1.0.4
>>     hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
>>     hadoop_bin=$hadoop_home/bin/hadoop
>>     hadoop_config=$hadoop_home/conf
>>
>> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
>>     #task specific parameters
>>     source_code=BetweennessCentrality.java
>>     jar_file=BetweennessCentrality.jar
>>     main_class=mslab.BetweennessCentrality
>>     num_of_node=38012
>>     num_of_mapper=100
>>     num_of_reducer=8
>>     input_path=/data/dblp_author_conf_adj.txt
>>     output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
>>     rm build -rf
>>     mkdir build
>>     $java_home/bin/javac -d build -classpath .:$hadoop_lib
>> src/mslab/$source_code
>>     rm $jar_file -f
>>     $java_home/bin/jar -cf $jar_file -C build/ .
>>     $hadoop_bin --config $hadoop_config fs -rmr $output_path
>>     $hadoop_bin --config $hadoop_config jar $jar_file $main_class
>> $num_of_node       $num_of_mapper
>>
>>     rm brandes_mapper
>>
>>     g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
>>     $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
>> mapred.task.timeout=0 -D mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
>> -D mapred.reduce.tasks=$num_of_reducer -input
>> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
>> brandes_mapper -file src/mslab/BC_reducer.py -file
>> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
>> $input_path $num_of_node" -reducer "./BC_reducer.py"
>>
>> When I run this code in a shell script, i get the following errors:
>>
>>     Warning: $HADOOP_HOME is deprecated.
>>     File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
>> is not readable.
>>     Streaming Command Failed!
>>
>> but the file exits at the specified path
>>
>>     /Downloads/mgmf/trunk/data$ ls
>>     dblp_author_conf_adj.txt
>>
>> I have also added the input file into HDFS using
>>
>>     /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>>
>> Can someone help me solve this problem?
>>
>>
>> Any help is appreciated,
>> Thanks
>> Prithvi
>>
>
>

Mime
View raw message