hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel LEMOINE <samuel.lemo...@lingway.com>
Subject problem scaling cluster up to 2 slaves
Date Wed, 01 Aug 2007 12:13:13 GMT
Hi everyone !

I'm still trying to understand the way hadoop works, and the 
possibilities offered in parallelizing java applications with haddop 
(especially lucene-based ones).
For the moment, I've focused my efforts on the examples given (Grep and 
WordCount).
I've managed to make both of them work with a hadoop running as 
"file:///" namenode.
When I try it on dfs monde (namenode configured to my local ip), Grep 
still works, but WordCount doesn't anymore.
I manage to make Grep work with 1 computer for namenode and another for 
1 datanode (obviously, WordCount still doesn't work).

But as soon as I try to built some kind of mini "real" cluster, with a 
namenode and n datanodes, with n=2, the map/reduce blocks. It doesn't 
produce any error, and the logs files don't show anything strange, the 
execution just freezes a little time after the beginning of the reduce 
task. I've noticed that the reduce task begins before the end of the map 
task when 2 slaves are availables, which is not the case with only 1 
slave. I know that it's the expected behaviour, but I suspect it to be 
the cause of the freeze.
The execution is doesn't follow exactly the same way each time, 
sometimes it blocks on 91%map/31%reduce, other times 89%map/15%reduce, 
but it's always at the beginning of the reduce task. I've tryed to let 
it run for a whole night in case it was just veeeeryyyy long, but it 
didn't go any further.
The 2 slaves are quite identical, and both of them work when alone as slave.

I'm blocked at this point for about 5 days, hanging around with no track 
to explore anymore. Any help will be greatly appreciated, any clue or 
so... if someone knows a good way for monitoring the distant java-tasks 
launched on the distant jobtrackers/tasktrackers, it's also welcome :)

Oh, a little precision that shouldn't change much things: the 2 slaves 
are virtual machines, with 256MB of RAM each. All access to them are 
through ssh (which is quite boring when you change the config files a 
lot of times to try the different possibilities :D )


Samuel


PS: I promise to share my hadoop experience as soon as it takes a 
consistent shape. I anticipate rediging a tutorial for the company where 
i'm doing my internship, and then translate it in english for the wiki.
PS2: here follow the console messages for a few of my attempts;

console messages while running Grep on a 1master-2slaves dfs architecture:

/opt/java/bin/java -Didea.launcher.port=7540 
-Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 
-classpath 
/opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/hadoopTest/classes/test/hadoopTest:/home/samuel/IdeaProjects/hadoopTest/classes/production/hadoopTest:/home/samuel/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-cli-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-codec-1.3.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-logging-1.0.4.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/httpclient-4.0-alpha1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/ht

tpcore-4.0-alpha5.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar:/home/samuel/IdeaProjects/hadoopTest/hadoop/conf:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/opt/idea-6180/lib/idea_rt.jar

com.intellij.rt.execution.application.AppMain 
com.lingway.hadoopScratchPad.Grep /user/hadoop/documents 
/user/hadoop/results blabla
07/08/01 11:14:01 INFO mapred.FileInputFormat: Total input paths to 
process : 19
07/08/01 11:14:03 INFO mapred.JobClient: Running job: job_0001
07/08/01 11:14:04 INFO mapred.JobClient:  map 0% reduce 0%
07/08/01 11:14:14 INFO mapred.JobClient:  map 10% reduce 0%
07/08/01 11:14:15 INFO mapred.JobClient:  map 21% reduce 0%
07/08/01 11:14:16 INFO mapred.JobClient:  map 26% reduce 0%
07/08/01 11:14:17 INFO mapred.JobClient:  map 31% reduce 0%
07/08/01 11:14:18 INFO mapred.JobClient:  map 42% reduce 0%
07/08/01 11:14:19 INFO mapred.JobClient:  map 47% reduce 0%
07/08/01 11:14:21 INFO mapred.JobClient:  map 57% reduce 0%
07/08/01 11:14:22 INFO mapred.JobClient:  map 63% reduce 0%
07/08/01 11:14:23 INFO mapred.JobClient:  map 73% reduce 0%
07/08/01 11:14:24 INFO mapred.JobClient:  map 78% reduce 0%
07/08/01 11:14:25 INFO mapred.JobClient:  map 84% reduce 0%
07/08/01 11:14:26 INFO mapred.JobClient:  map 89% reduce 0%
07/08/01 11:14:34 INFO mapred.JobClient:  map 89% reduce 15%

//and then doesn't go any further




********************************************************************************************

console messages while running WordCount on a single-node dfs architecture:

/opt/java/bin/java -Didea.launcher.port=7537 
-Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 
-classpath 
/opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/hadoopTest/classes/test/hadoopTest:/home/samuel/IdeaProjects/hadoopTest/classes/production/hadoopTest:/home/samuel/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-cli-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-codec-1.3.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-logging-1.0.4.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/httpclient-4.0-alpha1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/ht

tpcore-4.0-alpha5.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar:/home/samuel/IdeaProjects/hadoopTest/hadoop/conf:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/opt/idea-6180/lib/idea_rt.jar

com.intellij.rt.execution.application.AppMain 
com.lingway.hadoopScratchPad.WordCount /user/hadoop/documents 
/user/hadoop/resultats
07/08/01 10:47:41 INFO mapred.FileInputFormat: Total input paths to 
process : 19
07/08/01 10:47:42 INFO mapred.JobClient: Running job: job_0004
07/08/01 10:47:43 INFO mapred.JobClient:  map 0% reduce 0%
07/08/01 10:47:48 INFO mapred.JobClient: Task Id : task_0004_m_000000_0, 
Status : FAILED
07/08/01 10:47:52 INFO mapred.JobClient: Task Id : task_0004_m_000001_0, 
Status : FAILED
07/08/01 10:47:57 INFO mapred.JobClient: Task Id : task_0004_m_000002_0, 
Status : FAILED
07/08/01 10:48:01 INFO mapred.JobClient: Task Id : task_0004_m_000003_0, 
Status : FAILED
07/08/01 10:48:05 INFO mapred.JobClient: Task Id : task_0004_m_000004_0, 
Status : FAILED
07/08/01 10:48:10 INFO mapred.JobClient: Task Id : task_0004_m_000005_0, 
Status : FAILED
07/08/01 10:48:14 INFO mapred.JobClient: Task Id : task_0004_m_000006_0, 
Status : FAILED
07/08/01 10:48:18 INFO mapred.JobClient: Task Id : task_0004_m_000007_0, 
Status : FAILED
07/08/01 10:48:24 INFO mapred.JobClient: Task Id : task_0004_m_000008_0, 
Status : FAILED
07/08/01 10:48:28 INFO mapred.JobClient: Task Id : task_0004_m_000009_0, 
Status : FAILED
07/08/01 10:48:32 INFO mapred.JobClient: Task Id : task_0004_m_000010_0, 
Status : FAILED
07/08/01 10:48:36 INFO mapred.JobClient: Task Id : task_0004_m_000011_0, 
Status : FAILED
07/08/01 10:48:41 INFO mapred.JobClient: Task Id : task_0004_m_000012_0, 
Status : FAILED
07/08/01 10:48:45 INFO mapred.JobClient: Task Id : task_0004_m_000013_0, 
Status : FAILED
07/08/01 10:48:49 INFO mapred.JobClient: Task Id : task_0004_m_000014_0, 
Status : FAILED
07/08/01 10:48:54 INFO mapred.JobClient: Task Id : task_0004_m_000015_0, 
Status : FAILED
07/08/01 10:48:58 INFO mapred.JobClient: Task Id : task_0004_m_000016_0, 
Status : FAILED
07/08/01 10:49:02 INFO mapred.JobClient: Task Id : task_0004_m_000017_0, 
Status : FAILED
07/08/01 10:49:06 INFO mapred.JobClient: Task Id : task_0004_m_000018_0, 
Status : FAILED
07/08/01 10:49:11 INFO mapred.JobClient: Task Id : task_0004_m_000000_1, 
Status : FAILED
07/08/01 10:49:15 INFO mapred.JobClient: Task Id : task_0004_m_000000_2, 
Status : FAILED
07/08/01 10:49:19 INFO mapred.JobClient:  map 100% reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at com.lingway.hadoopScratchPad.WordCount.main(WordCount.java:145)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)

Process finished with exit code 1








********************************************************************************************


console messages while running WordCount on a 1master-1slave dfs 
architecture:

/opt/java/bin/java -Didea.launcher.port=7539 
-Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 
-classpath 
/opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/hadoopTest/classes/test/hadoopTest:/home/samuel/IdeaProjects/hadoopTest/classes/production/hadoopTest:/home/samuel/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-cli-1.1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-codec-1.3.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/commons-logging-1.0.4.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/httpclient-4.0-alpha1.jar:/home/samuel/IdeaProjects/hadoopTest/lib/http/ht

tpcore-4.0-alpha5.jar:/home/samuel/IdeaProjects/hadoopTest/lib/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar:/home/samuel/IdeaProjects/hadoopTest/hadoop/conf:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/opt/idea-6180/lib/idea_rt.jar

com.intellij.rt.execution.application.AppMain 
com.lingway.hadoopScratchPad.WordCount /user/hadoop/documents 
/user/hadoop/resultats
07/08/01 11:08:24 INFO mapred.FileInputFormat: Total input paths to 
process : 19
07/08/01 11:08:26 INFO mapred.JobClient: Running job: job_0001
07/08/01 11:08:27 INFO mapred.JobClient:  map 0% reduce 0%
07/08/01 11:08:37 INFO mapred.JobClient: Task Id : task_0001_m_000002_0, 
Status : FAILED
07/08/01 11:08:37 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:37 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:37 INFO mapred.JobClient: Task Id : task_0001_m_000000_0, 
Status : FAILED
07/08/01 11:08:37 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:37 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:41 INFO mapred.JobClient: Task Id : task_0001_m_000005_0, 
Status : FAILED
07/08/01 11:08:41 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:41 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:42 INFO mapred.JobClient: Task Id : task_0001_m_000004_0, 
Status : FAILED
07/08/01 11:08:42 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:42 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:46 INFO mapred.JobClient: Task Id : task_0001_m_000006_0, 
Status : FAILED
07/08/01 11:08:46 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:46 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:47 INFO mapred.JobClient: Task Id : task_0001_m_000001_0, 
Status : FAILED
07/08/01 11:08:47 INFO mapred.JobClient: Task Id : task_0001_m_000007_0, 
Status : FAILED
07/08/01 11:08:47 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:48 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:48 INFO mapred.JobClient: Task Id : task_0001_m_000003_0, 
Status : FAILED
07/08/01 11:08:52 INFO mapred.JobClient: Task Id : task_0001_m_000008_0, 
Status : FAILED
07/08/01 11:08:52 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:52 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:52 INFO mapred.JobClient: Task Id : task_0001_m_000009_0, 
Status : FAILED
07/08/01 11:08:52 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:52 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:57 INFO mapred.JobClient: Task Id : task_0001_m_000010_0, 
Status : FAILED
07/08/01 11:08:57 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:57 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:57 INFO mapred.JobClient: Task Id : task_0001_r_000000_0, 
Status : FAILED
07/08/01 11:08:57 INFO mapred.JobClient: Task Id : task_0001_m_000011_0, 
Status : FAILED
07/08/01 11:08:57 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:08:57 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000012_0, 
Status : FAILED
07/08/01 11:09:02 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:02 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000013_0, 
Status : FAILED
07/08/01 11:09:02 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:02 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000002_1, 
Status : FAILED
07/08/01 11:09:02 INFO mapred.JobClient: Task Id : task_0001_m_000000_1, 
Status : FAILED
07/08/01 11:09:07 INFO mapred.JobClient: Task Id : task_0001_m_000014_0, 
Status : FAILED
07/08/01 11:09:07 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:07 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:07 INFO mapred.JobClient: Task Id : task_0001_m_000015_0, 
Status : FAILED
07/08/01 11:09:07 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:07 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:11 INFO mapred.JobClient: Task Id : task_0001_m_000016_0, 
Status : FAILED
07/08/01 11:09:11 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:11 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:12 INFO mapred.JobClient: Task Id : task_0001_m_000017_0, 
Status : FAILED
07/08/01 11:09:12 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:12 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:16 INFO mapred.JobClient: Task Id : task_0001_m_000018_0, 
Status : FAILED
07/08/01 11:09:16 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:16 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:16 INFO mapred.JobClient: Task Id : task_0001_m_000001_1, 
Status : FAILED
07/08/01 11:09:16 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:16 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:17 INFO mapred.JobClient: Task Id : task_0001_m_000004_1, 
Status : FAILED
07/08/01 11:09:18 INFO mapred.JobClient: Task Id : task_0001_m_000005_1, 
Status : FAILED
07/08/01 11:09:21 INFO mapred.JobClient: Task Id : task_0001_m_000000_2, 
Status : FAILED
07/08/01 11:09:21 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:21 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:21 INFO mapred.JobClient: Task Id : task_0001_m_000003_1, 
Status : FAILED
07/08/01 11:09:21 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:21 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:27 INFO mapred.JobClient:  map 100% reduce 100%
07/08/01 11:09:27 INFO mapred.JobClient: Task Id : task_0001_m_000001_2, 
Status : FAILED
07/08/01 11:09:27 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
07/08/01 11:09:27 WARN mapred.JobClient: Error reading task 
outputubuntu704.e-manation.com
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at com.lingway.hadoopScratchPad.WordCount.main(WordCount.java:145)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)

Process finished with exit code 1



Mime
View raw message