giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Williams <zoo9...@hotmail.com>
Subject RE: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
Date Thu, 05 Sep 2013 12:10:11 GMT
Hi Claudio,
The patch worked !!  :-)
Just to be clear,        I am running Giraph (1.0.0), not git cloned.         and hadoop 2.0.0-cdh4.1.1
I applied your patch and rebuilt the giraph source code with this command,               
       mvn -Phadoop_2.0.0 clean compile package test install verify           This built correctly,
with no exceptions and no tests failed.   
I then ran the giraph example, which ran successfully with this command
[root@localhost giraph]# hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-
alpha-jar-with-dependencies.jar  org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex
 -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat  -vip /user/root/input/tiny_graph.txt
  -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat   -op /user/root/output/shortestpaths
-w 1
I then deleted the output                hadoop fs -rm -R  /user/root/output/shortestpaths
I then restarted my HBase daemons, and ran the giraph example again, and it worked successfully
again,no errors, no exceptions, no tasks failed, and output produced correctly.
Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on port 22181.
     Thank you very much for your help  :-)
Ken

From: claudio.martella@gmail.com
Date: Wed, 4 Sep 2013 19:21:37 +0200
Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer
does not exist.
To: user@giraph.apache.org

Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing zookeeper is not used
through the giraph.zkServerList parameter, with its own configuration listening on port 22181.



On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams <zoo9000@hotmail.com> wrote:





Hmmmmmmmm. Interesting.
Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ?
The only version of ZooKeeper I have installed is the one that came with HBase,

and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies clientPort=2181This is the
only zoo.cfg file on my machine.



[root@localhost]# cat /etc/zookeeper/conf/zoo.cfg ....maxClientCnxns=50# The number of milliseconds
of each tick

tickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10#
The number of ticks that can pass between 

# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot
is stored.dataDir=/var/lib/zookeeper# the port at which the clients will connect

clientPort=2181server.1=localhost:2888:3888[root@localhost Downloads]# 


From: claudio.martella@gmail.com


Date: Wed, 4 Sep 2013 12:13:50 +0200
Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer
does not exist.
To: user@giraph.apache.org



That should in principle not be the case, as the zookeeper started by Giraph listens on a
different port than the default. See parameter giraph.zkServerPort, which defaults to 22181.



On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams <zoo9000@hotmail.com> wrote:







Hi Claudio,
    I think I have fixed the problem.
   HBase runs with its own copy of ZooKeeper which listens on port 2181.   So, when I tried
to start ZooKeeper for Giraph it also tried to listen on port 2181



   and found it was already in use, and then it terminated - which is why Giraph failed. 
 If I stop the HBase daemons (including its copy of ZooKeeper) then Giraph runs fine. 
   Essentially there is a conflict between running ZooKeeper for Giraph, if there is 



   already ZooKeeper running for HBase. 
   I will try the patch and get back to you.
   Thanks for all your help,
Ken




From: claudio.martella@gmail.com
Date: Tue, 3 Sep 2013 17:01:01 +0200
Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer
does not exist.




To: user@giraph.apache.org

try with the attached patch applied to trunk, without the mentioned -D giraph.zkManagerDirectory.





On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams <zoo9000@hotmail.com> wrote:





Hi Claudio,
    I tried this but it made no difference. The map tasks still fail, still no output, and
still anexception in the log files - FileNotFoundException: File /tmp/giraph/_zkServer does
not exist.






[root@localhost giraph]# hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
  org.apache.giraph.GiraphRunner  -Dgiraph.zkManagerDirectory='/tmp/giraph/'     org.apache.giraph.examples.SimpleShortestPathsVertex
 -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/root/input/tiny_graph.txt
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output/shortestpaths
-w 1 





13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your
InputFormat does not require one.13/09/03 14:19:58 WARN job.GiraphConfigurationValidator:
Output format vertex index type is not known





13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex value type is
not known13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format edge value
type is not known





13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not
allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)13/09/03 14:19:58
WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.





13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_003913/09/03 14:20:02
INFO mapred.JobClient:  map 0% reduce 0%13/09/03 14:20:12 INFO mapred.JobClient: Job complete:
job_201308291126_0039





13/09/03 14:20:12 INFO mapred.JobClient: Counters: 613/09/03 14:20:12 INFO mapred.JobClient:
  Job Counters 13/09/03 14:20:12 INFO mapred.JobClient:     Failed map tasks=113/09/03 14:20:12
INFO mapred.JobClient:     Launched map tasks=2





13/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all maps in occupied slots
(ms)=1632713/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all reduces in
occupied slots (ms)=0





13/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving
slots (ms)=013/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all reduces waiting
after reserving slots (ms)=0





[root@localhost giraph]# 

When I try to run Zookeeper it still gives me an 'Address already in use' exception.
[root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh start-foreground





JMX enabled by defaultUsing config: /usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,882
[myid:] - INFO  [main:QuorumPeerConfig@101] - Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg





2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid configuration,
only one server specified (ignoring)2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@78]
- autopurge.snapRetainCount set to 3





2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval
set to 02013-09-03 14:23:37,890 [myid:] - INFO  [main:DatadirCleanupManager@101] - Purge task
is not scheduled.





2013-09-03 14:23:37,890 [myid:] - WARN  [main:QuorumPeerMain@118] - Either no config or no
quorum defined in config, running  in standalone mode2013-09-03 14:23:37,904 [myid:] - INFO
 [main:QuorumPeerConfig@101] - Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg





2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid configuration,
only one server specified (ignoring)2013-09-03 14:23:37,905 [myid:] - INFO  [main:ZooKeeperServerMain@100]
- Starting server





2013-09-03 14:23:37,920 [myid:] - INFO  [main:Environment@100] - Server environment:zookeeper.version=3.4.3-cdh4.1.1--1,
built on 10/16/2012 17:34 GMT2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100]
- Server environment:host.name=localhost.localdomain





2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server environment:java.version=1.6.0_312013-09-03
14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server environment:java.vendor=Sun Microsystems
Inc.





2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server environment:java.home=/usr/java/jdk1.6.0_31/jre2013-09-03
14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf:





2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/i386/client:/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib





2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server environment:java.io.tmpdir=/tmp2013-09-03
14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server environment:java.compiler=<NA>





2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server environment:os.name=Linux2013-09-03
14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server environment:os.arch=i386





2013-09-03 14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server environment:os.version=2.6.32-279.14.1.el6.i6862013-09-03
14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server environment:user.name=root





2013-09-03 14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server environment:user.home=/root2013-09-03
14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server environment:user.dir=/usr/local/giraph-1.0.0





2013-09-03 14:23:37,934 [myid:] - INFO  [main:ZooKeeperServer@726] - tickTime set to 20002013-09-03
14:23:37,934 [myid:] - INFO  [main:ZooKeeperServer@735] - minSessionTimeout set to -12013-09-03
14:23:37,935 [myid:] - INFO  [main:ZooKeeperServer@744] - maxSessionTimeout set to -1





2013-09-03 14:23:37,970 [myid:] - INFO  [main:NIOServerCnxnFactory@99] - binding to port 0.0.0.0/0.0.0.0:21812013-09-03
14:23:37,972 [myid:] - ERROR [main:ZooKeeperServerMain@68] - Unexpected exception, exiting
abnormally





java.net.BindException: Address already in use	at sun.nio.ch.Net.bind(Native Method)	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)





	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)





	at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)





	at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)





	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)





[root@localhost giraph]# 

      Thank you for any help,
Ken



From: claudio.martella@gmail.com






Date: Tue, 3 Sep 2013 12:43:59 +0200
Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer
does not exist.
To: user@giraph.apache.org







can you try defining the zookeeper manager directory from the command line? like this -D giraph.zkManagerDirectory=/path/in/hdfs/foobar
you'll have to delete this directory by hand before each job. Just to see if it solves the
problem. Then I could know how to fix it.









On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams <zoo9000@hotmail.com> wrote:







Hi Pradeep,
Yes, the zookeeper server is definitely running, I can connect to it with the command-line
client    [root@localhost giraph]# zkCli.sh  -server 127.0.0.1:2181







Connecting to 127.0.0.1:21812013-09-03 11:15:45,987 [myid:] - INFO  [main:Environment@100]
- Client environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 GMT







2013-09-03 11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=localhost.localdomain2013-09-03
11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.6.0_31







......WatchedEvent state:SyncConnected type:None path:null[zk: 127.0.0.1:2181(CONNECTED) 0]
ls /[hbase, zookeeper][zk: 127.0.0.1:2181(CONNECTED) 1] 









However, I am a bit confused. If I look in the zookeeper log-file I see this port 2181 'Address
already in use' error,
2013-09-03 10:52:24,412 [myid:] - INFO  [main:ZooKeeperServer@735] - minSessionTimeout set
to -1







2013-09-03 10:52:24,413 [myid:] - INFO  [main:ZooKeeperServer@744] - maxSessionTimeout set
to -12013-09-03 10:52:24,436 [myid:] - INFO  [main:NIOServerCnxnFactory@99] - binding to port
0.0.0.0/0.0.0.0:2181







2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] - Unexpected exception,
exiting abnormallyjava.net.BindException: Address already in use	at sun.nio.ch.Net.bind(Native
Method)







	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)







	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)	at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)







	at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)








The process listening on port 2181 is 2892, which turns out to be HBase. 
[root@localhost giraph]# fuser 2181/tcp2181/tcp:             2892[root@localhost giraph]#
ps aux | grep 2892







hbase     2892  0.1  3.2 719592 119624 ?       Sl   Aug29   7:35 /usr/java/jdk1.6.0_31/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx500m -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log -Dhbase.home.dir=/usr/lib/hbase/bin/..
  







......
So I am not sure what my zookeeper client is connecting to.     It seems to be connecting
to a zookeeper server but when I do 'ps' I cannot see a zookeeper server running. 







Here is my zoo.cfg file,
maxClientCnxns=50# The number of milliseconds of each ticktickTime=2000# The number of ticks
that the initial synchronization phase can take







initLimit=10# The number of ticks that can pass between # sending a request and getting an
acknowledgementsyncLimit=5# the directory where the snapshot is stored.







dataDir=/var/lib/zookeeper# the port at which the clients will connectclientPort=2181server.1=localhost:2888:3888
    Thanks for any help,








Ken


-- 
    Claudio Martella
    claudio.martella@gmail.com   
 		 	   		  


-- 
    Claudio Martella
    claudio.martella@gmail.com   
 		 	   		  


-- 
    Claudio Martella
    claudio.martella@gmail.com   
 		 	   		  


-- 
    Claudio Martella
    claudio.martella@gmail.com   
 		 	   		  
Mime
View raw message