hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shapovalov <shapova...@graphics.cs.msu.su>
Subject Re: Problem initializing pipes in HamaStreaming
Date Fri, 27 Sep 2013 11:17:00 GMT
> It seems Streaming could not find the Python files, since it searched them in the local
file system.

It works if I specify references to the local files. However, if I set
hdfs://localhost/ as a file system, I keep getting the connection
error. May the port number matter?

Roman

On Fri, Sep 27, 2013 at 6:55 AM, Roman Shapovalov
<shapovalov@graphics.cs.msu.su> wrote:
> Martin,
>
>> then you don't have started hdfs?
>
> I have not started it manually, but it has been active:
>
> NameNode '0.0.0.0:8020' (active)
> Started:Wed Sep 25 18:54:42 EDT 2013
>
>> Your hdfs should contain the following files:
>
> It does.
>
>> Without the default file system in hama-site.xml, it will not work.
>
> Well, at least Hama (without streaming) worked, using the local file system.
> It seems Streaming could not find the Python files, since it searched
> them in the local file system.
>
> Roman
>
> On Fri, Sep 27, 2013 at 6:30 AM, Martin Illecker <millecker@apache.org> wrote:
>> Hi Roman,
>>
>> then you don't have started hdfs? (start-dfs.sh)
>>
>> Are you able to access the hdfs namenode?
>> http://localhost:50070/dfshealth.jsp
>>
>> Your hdfs should contain the following files:
>>
>> $hadoop fs -ls /tmp/PyStreaming/
>> Found 8 items
>> -rw-r--r--   279 2013-09-27 12:19 /tmp/PyStreaming/BSP.py
>> -rw-r--r--   5159 2013-09-27 12:19 /tmp/PyStreaming/BSPPeer.py
>> -rw-r--r--   379 2013-09-27 12:19 /tmp/PyStreaming/BSPRunner.py
>> -rw-r--r--   970 2013-09-27 12:19 /tmp/PyStreaming/BinaryProtocol.py
>> -rw-r--r--   299 2013-09-27 12:19 /tmp/PyStreaming/BspJobConfiguration.py
>> -rw-r--r--   557 2013-09-27 12:19 /tmp/PyStreaming/HelloWorldBSP.py
>> -rw-r--r--   5570 2013-09-27 12:19 /tmp/PyStreaming/KMeansBSP.py
>> -rw-r--r--   326 2013-09-27 12:19 /tmp/PyStreaming/README
>>
>> Without the default file system in hama-site.xml, it will not work.
>>
>> Martin
>>
>>
>> 2013/9/27 Roman Shapovalov <shapovalov@graphics.cs.msu.su>
>>
>>> Martin,
>>>
>>> if I set default file system to hdfs://localhost/, I get the connection
>>> error:
>>>
>>> 13/09/27 14:04:11 INFO ipc.Client: Retrying connect to server:
>>> localhost/127.0.0.1:40000. Already tried 0 time(s); retry policy is
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> SECONDS)
>>>
>>> (and 10 times like that, than get a java.net.ConnectException).
>>>
>>> I attach the hama-site.xml (as it was before adding the default fs
>>> property). I had only added the bsp.master.address property to switch
>>> to the PDM.
>>>
>>> Roman
>>>
>>> On Fri, Sep 27, 2013 at 4:20 AM, Martin Illecker <martin@illecker.at>
>>> wrote:
>>> > Hi Roman!
>>> >
>>> > Did you setup the default filesystem in hama-site.xml?
>>> >
>>> > Please submit your hama-site.xml configuration.
>>> >
>>> > Martin
>>> >
>>> >
>>> > hama-site.xml - pseudo-distributed mode
>>> >
>>> > <configuration>
>>> >
>>> >     <property>
>>> >         <name>bsp.master.address</name>
>>> >         <value>localhost:40000</value>
>>> >         <description>The address of the bsp master server. Either
the
>>> >             literal string "local" or a host:port for distributed mode
>>> >         </description>
>>> >     </property>
>>> >
>>> >     <property>
>>> >         <name>fs.default.name</name>
>>> >         <value>hdfs://localhost/</value>
>>> >         <description>
>>> >             The name of the default file system. Either the literal
>>> string
>>> >             "local" or a host:port for HDFS.
>>> >         </description>
>>> >     </property>
>>> >
>>> >     <property>
>>> >         <name>hama.zookeeper.quorum</name>
>>> >         <value>localhost</value>
>>> >         <description>Comma separated list of servers in the ZooKeeper
>>> Quorum.
>>> >             For example, "host1.mydomain.com,host2.mydomain.com,
>>> host3.mydomain.com".
>>> >             By default this is set to localhost for local and
>>> pseudo-distributed modes
>>> >             of operation. For a fully-distributed setup, this should be
>>> set to a full
>>> >             list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is set
>>> in hama-env.sh
>>> >             this is the list of servers which we will start/stop
>>> zookeeper on.
>>> >         </description>
>>> >     </property>
>>> >
>>> > </configuration>
>>> >
>>> >
>>> > Am 27.09.2013 um 09:32 schrieb Roman Shapovalov <
>>> shapovalov@graphics.cs.msu.su>:
>>> >
>>> >> Edward,
>>> >>
>>> >> Yes, I did. See the logs in my previous message.
>>> >>
>>> >> Roman
>>> >>
>>> >> On Fri, Sep 27, 2013 at 7:15 AM, Edward J. Yoon <edwardyoon@apache.org>
>>> wrote:
>>> >>> Have you tried to run in pseudo-distributed mode?
>>> >>>
>>> >>> On Fri, Sep 27, 2013 at 5:47 AM, Roman Shapovalov
>>> >>> <shapovalov@graphics.cs.msu.su> wrote:
>>> >>>> Martin,
>>> >>>>
>>> >>>> Thanks for such verbose instructions.
>>> >>>>
>>> >>>>> You can find all Hama configuration files in the *conf*
folder.
>>> >>>>
>>> >>>> OK, I thought Edward meant Hadoop configs specifically.
>>> >>>> I have only added JAVA_HOME variable there, otherwise they are
>>> default.
>>> >>>>
>>> >>>>> You should also find task logs in your *temp* folder.
>>> >>>>
>>> >>>> I found the folder, but there were no .log files in the attempt*
>>> >>>> folders (in both modes).
>>> >>>>
>>> >>>>> Normally you should find it in *hama/logs/tasklogs*.
>>> >>>>
>>> >>>> They appear in the pseudo-distributed mode only (which also
fails).
>>> >>>> See the attached file.
>>> >>>>
>>> >>>>> By the way do you have python3.2 installed? :-)
>>> >>>>
>>> >>>> Yes. "python" links to Python 2.6, but I pass "python3.2" as
an
>>> >>>> interpreter, which links to the correct version.
>>> >>>>
>>> >>>>
>>> >>>> Roman
>>> >>>>
>>> >>>> On Thu, Sep 26, 2013 at 4:03 PM, Martin Illecker <
>>> millecker@apache.org> wrote:
>>> >>>>> Hi Roman,
>>> >>>>>
>>> >>>>> if you are running Hama in local mode, it will not use HDFS
anyway.
>>> >>>>>
>>> >>>>> You can find all Hama configuration files in the *conf*
folder.
>>> >>>>>
>>> >>>>> $ll hama/conf/
>>> >>>>> total 56
>>> >>>>> -rwxr-xr-x groomservers*
>>> >>>>> -rwxr-xr-x hama-default.xml*
>>> >>>>> -rwxr-xr-x hama-env.sh*
>>> >>>>> -rwxr-xr-x hama-site.xml*
>>> >>>>> -rwxr-xr-x log4j.properties*
>>> >>>>>
>>> >>>>> Probably you should setup the Pseudo Distributed Mode [1]
in
>>> hama-site.xml.
>>> >>>>>
>>> >>>>> But the task log would be very interesting.
>>> >>>>>
>>> >>>>> Normally you should find it in *hama/logs/tasklogs*.
>>> >>>>> e.g.,
>>> hama/logs/tasklogs/job_201309262134_0001/attempt_201309262134_0001_000000_0.log
>>> >>>>>
>>> >>>>> You should also find task logs in your *temp* folder.
>>> >>>>> But this location will depend on your operation system.
>>> >>>>> e.g., in OSX
>>> >>>>>
>>> /private/tmp/hadoop-YOURUSER/bsp/local/groomServer/attempt_201309262134_0001_000000_0/work/tasklogs/
>>> >>>>>
>>> >>>>> By the way do you have python3.2 installed? :-)
>>> >>>>> $ python --version
>>> >>>>> Python 3.2.5
>>> >>>>> $ python3.2 --version
>>> >>>>> Python 3.2.5
>>> >>>>>
>>> >>>>> May I ask which operation system do you use?
>>> >>>>>
>>> >>>>> Martin
>>> >>>>>
>>> >>>>> [1]
>>> http://wiki.apache.org/hama/GettingStarted#Pseudo_Distributed_Mode
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/9/26 Roman Shapovalov <shapovalov@graphics.cs.msu.su>
>>> >>>>>
>>> >>>>>> Hi Edward,
>>> >>>>>>
>>> >>>>>> Could you please be more specific? (Sorry, I am new
to this stuff)
>>> >>>>>>
>>> >>>>>> I run Hama in local mode. The logs/ directory is empty,
and I did
>>> not
>>> >>>>>> find any logs in HDFS as well.
>>> >>>>>>
>>> >>>>>> And where can I find the Hadoop configuration?
>>> >>>>>>
>>> >>>>>> Thank you,
>>> >>>>>> Roman
>>> >>>>>>
>>> >>>>>> On Thu, Sep 26, 2013 at 12:05 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> >>>>>> wrote:
>>> >>>>>>> Hi,
>>> >>>>>>>
>>> >>>>>>> That's strange. Can you attach your namenode logs
and hadoop
>>> >>>>>> configurations?
>>> >>>>>>>
>>> >>>>>>> On Thu, Sep 26, 2013 at 11:03 PM, Roman Shapovalov
>>> >>>>>>> <shapovalov@graphics.cs.msu.su> wrote:
>>> >>>>>>>> Hi again,
>>> >>>>>>>>
>>> >>>>>>>> I have updated both Hama (from the trunk) and
Streaming (from
>>> Martin's
>>> >>>>>>>> github), and checked that patches have been
applied, but I keep
>>> >>>>>>>> getting the same error (full log for local configuration
is
>>> attached).
>>> >>>>>>>>
>>> >>>>>>>> Another thing may be relevant: I keep the default
Hadoop
>>> libraries in
>>> >>>>>>>> lib/. If I replace them as the tutorial says,
some classes cannot
>>> be
>>> >>>>>>>> found even if  I run pure Hama (which works
perfectly with default
>>> >>>>>>>> libs). I don't know if it is important.
>>> >>>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>> Roman
>>> >>>>>>>>
>>> >>>>>>>> On Tue, Sep 24, 2013 at 9:22 AM, Martin Illecker
<
>>> millecker@apache.org>
>>> >>>>>> wrote:
>>> >>>>>>>>> Hi Roman,
>>> >>>>>>>>>
>>> >>>>>>>>> sorry for inconvenience!
>>> >>>>>>>>> The problem has been reported [1] and will
be fixed shortly to
>>> the
>>> >>>>>> trunk.
>>> >>>>>>>>>
>>> >>>>>>>>> [1] https://issues.apache.org/jira/browse/HAMA-805
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> 2013/9/23 Edward J. Yoon <edwardyoon@apache.org>
>>> >>>>>>>>>
>>> >>>>>>>>>> This looks like a bug of DistCacheUtils.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thanks for your report. I'll look at
it tomorrow.
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Mon, Sep 23, 2013 at 11:52 PM, Roman
Shapovalov
>>> >>>>>>>>>> <shapovalov@graphics.cs.msu.su>
wrote:
>>> >>>>>>>>>>> Hello all,
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I try to use Hama Streaming.
>>> >>>>>>>>>>> I have successfully installed Hama
(the Pi example works).
>>> >>>>>>>>>>> I follow this tutorial:
>>> >>>>>>>>>>> http://wiki.apache.org/hama/HamaStreaming
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> When I try to run the distributed
HelloWorld in the local
>>> >>>>>>>>>>> configuration, I get the following
error:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> $ bin/hama pipes -streaming true
-bspTasks 3 -interpreter
>>> python3.2
>>> >>>>>>>>>>> -cachefiles /tmp/PyStreaming/*.py
-output /tmp/pystream-out/
>>> >>>>>> -program
>>> >>>>>>>>>>> /tmp/PyStreaming/BSPRunner.py -programArgs
HelloWorldBSP
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> 13/09/23 18:03:50 INFO pipes.Submitter:
Streaming enabled!
>>> >>>>>>>>>>> 13/09/23 18:03:50 WARN util.NativeCodeLoader:
Unable to load
>>> >>>>>>>>>>> native-hadoop library for your platform...
using builtin-java
>>> >>>>>> classes
>>> >>>>>>>>>>> where applicable
>>> >>>>>>>>>>> 13/09/23 18:03:50 WARN bsp.BSPJobClient:
No job jar file set.
>>>  User
>>> >>>>>>>>>>> classes may not be found. See BSPJob#setJar(String)
or check
>>> Your
>>> >>>>>> jar
>>> >>>>>>>>>>> file.
>>> >>>>>>>>>>> 13/09/23 18:03:50 INFO bsp.BSPJobClient:
Running job:
>>> >>>>>>>>>> job_localrunner_0001
>>> >>>>>>>>>>> 13/09/23 18:03:50 INFO bsp.LocalBSPRunner:
Setting up a new
>>> barrier
>>> >>>>>> for
>>> >>>>>>>>>> 3 tasks!
>>> >>>>>>>>>>> 13/09/23 18:03:50 ERROR bsp.LocalBSPRunner:
Exception during
>>> BSP
>>> >>>>>>>>>> execution!
>>> >>>>>>>>>>> java.lang.NullPointerException
>>> >>>>>>>>>>>    at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:44)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:255)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>>> >>>>>>>>>>>    at
>>> >>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> >>>>>>>>>>>    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>> >>>>>>>>>>>    at
>>> >>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> >>>>>>>>>>>    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> >>>>>>>>>>>    at java.lang.Thread.run(Thread.java:662)
>>> >>>>>>>>>>> [output cropped]
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> When I turn to the pseudo-distributed
mode, job fails too
>>> (after a
>>> >>>>>>>>>>> minute of execution):
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> 13/09/23 18:46:34 INFO pipes.Submitter:
Streaming enabled!
>>> >>>>>>>>>>> 13/09/23 18:46:34 WARN util.NativeCodeLoader:
Unable to load
>>> >>>>>>>>>>> native-hadoop library for your platform...
using builtin-java
>>> >>>>>> classes
>>> >>>>>>>>>>> where applicable
>>> >>>>>>>>>>> 13/09/23 18:46:34 WARN bsp.BSPJobClient:
No job jar file set.
>>>  User
>>> >>>>>>>>>>> classes may not be found. See BSPJob#setJar(String)
or check
>>> Your
>>> >>>>>> jar
>>> >>>>>>>>>>> file.
>>> >>>>>>>>>>> 13/09/23 18:46:34 INFO bsp.BSPJobClient:
Running job:
>>> >>>>>>>>>> job_201309231846_0001
>>> >>>>>>>>>>> 13/09/23 18:47:40 INFO bsp.BSPJobClient:
Job failed.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Task log contains errors:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server:
Starting Socket Reader #1
>>> for
>>> >>>>>> port
>>> >>>>>>>>>> 43475
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server:
IPC Server Responder:
>>> starting
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server:
IPC Server listener on
>>> 43475:
>>> >>>>>> starting
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO message.HadoopMessageManagerImpl:
>>>  BSPPeer
>>> >>>>>>>>>>> address:localhost.localdomain port:43475
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server:
IPC Server handler 0 on
>>> 43475:
>>> >>>>>>>>>> starting
>>> >>>>>>>>>>> 13/09/23 18:46:37 WARN util.NativeCodeLoader:
Unable to load
>>> >>>>>>>>>>> native-hadoop library for your platform...
using builtin-java
>>> >>>>>> classes
>>> >>>>>>>>>>> where applicable
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO sync.ZKSyncClient:
Initializing ZK Sync
>>> >>>>>> Client
>>> >>>>>>>>>>> 13/09/23 18:46:37 INFO sync.ZooKeeperSyncClientImpl:
Start
>>> >>>>>> connecting
>>> >>>>>>>>>>> to Zookeeper! At localhost.localdomain/127.0.0.1:43475
>>> >>>>>>>>>>> 13/09/23 18:46:37 ERROR bsp.BSPTask:
Error running bsp setup
>>> and bsp
>>> >>>>>>>>>> function.
>>> >>>>>>>>>>> java.lang.NullPointerException
>>> >>>>>>>>>>>    at java.io.File.<init>(File.java:222)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> org.apache.hama.pipes.PipesApplication.setupCommand(PipesApplication.java:130)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:257)
>>> >>>>>>>>>>>    at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:44)
>>> >>>>>>>>>>>    at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:176)
>>> >>>>>>>>>>>    at org.apache.hama.bsp.BSPTask.run(BSPTask.java:146)
>>> >>>>>>>>>>>    at
>>> >>>>>>>>>>
>>> >>>>>>
>>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1246)
>>> >>>>>>>>>>> [output cropped]
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I use the latest trunk version of
Hama, Python 3.2.5 and Hadoop
>>> >>>>>>>>>> 2.0.0-cdh4.1.1.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Please help me to figure out the
problem.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks in advance,
>>> >>>>>>>>>>> Roman
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> --
>>> >>>>>>>>>> Best Regards, Edward J. Yoon
>>> >>>>>>>>>> @eddieyoon
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Best Regards, Edward J. Yoon
>>> >>>>>>> @eddieyoon
>>> >>>>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best Regards, Edward J. Yoon
>>> >>> @eddieyoon
>>> >
>>>

Mime
View raw message