giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
Date Wed, 04 Sep 2013 10:13:50 GMT
That should in principle not be the case, as the zookeeper started by
Giraph listens on a different port than the default. See
parameter giraph.zkServerPort, which defaults to 22181.


On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams <zoo9000@hotmail.com> wrote:

> Hi Claudio,
>
>     I think I have fixed the problem.
>
>    HBase runs with its own copy of ZooKeeper which listens on port 2181.
>    So, when I tried to start ZooKeeper for Giraph it also tried to listen
> on port 2181
>    and found it was already in use, and then it terminated - which is why
> Giraph failed.
>    If I stop the HBase daemons (including its copy of ZooKeeper) then
> Giraph runs fine.
>
>    Essentially there is a conflict between running ZooKeeper for Giraph,
> if there is
>    already ZooKeeper running for HBase.
>
>    I will try the patch and get back to you.
>
>    Thanks for all your help,
>
> Ken
>
> ------------------------------
> From: claudio.martella@gmail.com
> Date: Tue, 3 Sep 2013 17:01:01 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
> try with the attached patch applied to trunk, without the mentioned -D
> giraph.zkManagerDirectory.
>
>
> On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams <zoo9000@hotmail.com> wrote:
>
> Hi Claudio,
>
>     I tried this but it made no difference. The map tasks still fail,
> still no output, and still an
> exception in the log files - FileNotFoundException: File
> /tmp/giraph/_zkServer does not exist.
>
> [root@localhost giraph]# hadoop jar
> /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
>   org.apache.giraph.GiraphRunner
>  -Dgiraph.zkManagerDirectory='/tmp/giraph/'
> org.apache.giraph.examples.SimpleShortestPathsVertex  -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/root/input/tiny_graph.txt -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/root/output/shortestpaths -w 1
> 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex index type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> vertex value type is not known
> 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format
> edge value type is not known
> 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 0, old value = 4)
> 13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_0039
> 13/09/03 14:20:02 INFO mapred.JobClient:  map 0% reduce 0%
> 13/09/03 14:20:12 INFO mapred.JobClient: Job complete:
> job_201308291126_0039
> 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6
> 13/09/03 14:20:12 INFO mapred.JobClient:   Job Counters
> 13/09/03 14:20:12 INFO mapred.JobClient:     Failed map tasks=1
> 13/09/03 14:20:12 INFO mapred.JobClient:     Launched map tasks=2
> 13/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all maps
> in occupied slots (ms)=16327
> 13/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all
> reduces in occupied slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/09/03 14:20:12 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> [root@localhost giraph]#
>
>
> When I try to run Zookeeper it still gives me an 'Address already in use'
> exception.
>
> [root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh
> start-foreground
> JMX enabled by default
> Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,882 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] -
> Invalid configuration, only one server specified (ignoring)
> 2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@78] -
> autopurge.snapRetainCount set to 3
> 2013-09-03 14:23:37,889 [myid:] - INFO  [main:DatadirCleanupManager@79] -
> autopurge.purgeInterval set to 0
> 2013-09-03 14:23:37,890 [myid:] - INFO  [main:DatadirCleanupManager@101]
> - Purge task is not scheduled.
> 2013-09-03 14:23:37,890 [myid:] - WARN  [main:QuorumPeerMain@118] -
> Either no config or no quorum defined in config, running  in standalone mode
> 2013-09-03 14:23:37,904 [myid:] - INFO  [main:QuorumPeerConfig@101] -
> Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
> 2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] -
> Invalid configuration, only one server specified (ignoring)
> 2013-09-03 14:23:37,905 [myid:] - INFO  [main:ZooKeeperServerMain@100] -
> Starting server
> 2013-09-03 14:23:37,920 [myid:] - INFO  [main:Environment@100] - Server
> environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34
> GMT
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:host.name=localhost.localdomain
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.version=1.6.0_31
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.vendor=Sun Microsystems Inc.
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.home=/usr/java/jdk1.6.0_31/jre
> 2013-09-03 14:23:37,921 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf:
> 2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/i386/client:/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
> 2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.io.tmpdir=/tmp
> 2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server
> environment:java.compiler=<NA>
> 2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server
> environment:os.name=Linux
> 2013-09-03 14:23:37,922 [myid:] - INFO  [main:Environment@100] - Server
> environment:os.arch=i386
> 2013-09-03 14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server
> environment:os.version=2.6.32-279.14.1.el6.i686
> 2013-09-03 14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server
> environment:user.name=root
> 2013-09-03 14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server
> environment:user.home=/root
> 2013-09-03 14:23:37,923 [myid:] - INFO  [main:Environment@100] - Server
> environment:user.dir=/usr/local/giraph-1.0.0
> 2013-09-03 14:23:37,934 [myid:] - INFO  [main:ZooKeeperServer@726] -
> tickTime set to 2000
> 2013-09-03 14:23:37,934 [myid:] - INFO  [main:ZooKeeperServer@735] -
> minSessionTimeout set to -1
> 2013-09-03 14:23:37,935 [myid:] - INFO  [main:ZooKeeperServer@744] -
> maxSessionTimeout set to -1
> 2013-09-03 14:23:37,970 [myid:] - INFO  [main:NIOServerCnxnFactory@99] -
> binding to port 0.0.0.0/0.0.0.0:2181
> 2013-09-03 14:23:37,972 [myid:] - ERROR [main:ZooKeeperServerMain@68] -
> Unexpected exception, exiting abnormally
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
>  at
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
>  at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
>  at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
> [root@localhost giraph]#
>
>
>       Thank you for any help,
>
> Ken
>
>
>
>
> ------------------------------
> From: claudio.martella@gmail.com
> Date: Tue, 3 Sep 2013 12:43:59 +0200
>
> Subject: Re: FileNotFoundException: File
> _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
> To: user@giraph.apache.org
>
>
> can you try defining the zookeeper manager directory from the command
> line? like this -D giraph.zkManagerDirectory=/path/in/hdfs/foobar
>
> you'll have to delete this directory by hand before each job. Just to see
> if it solves the problem. Then I could know how to fix it.
>
>
> On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams <zoo9000@hotmail.com> wrote:
>
> Hi Pradeep,
>
> Yes, the zookeeper server is definitely running, I can connect to it with
> the
> command-line client
>
> [root@localhost giraph]# zkCli.sh  -server 127.0.0.1:2181
> Connecting to 127.0.0.1:2181
> 2013-09-03 11:15:45,987 [myid:] - INFO  [main:Environment@100] - Client
> environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34
> GMT
> 2013-09-03 11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client
> environment:host.name=localhost.localdomain
> 2013-09-03 11:15:45,990 [myid:] - INFO  [main:Environment@100] - Client
> environment:java.version=1.6.0_31
> ......
> WatchedEvent state:SyncConnected type:None path:null
> [zk: 127.0.0.1:2181(CONNECTED) 0] ls /
> [hbase, zookeeper]
> [zk: 127.0.0.1:2181(CONNECTED) 1]
>
>
> However, I am a bit confused.
> If I look in the zookeeper log-file I see this port 2181 'Address already
> in use' error,
>
> 2013-09-03 10:52:24,412 [myid:] - INFO  [main:ZooKeeperServer@735] -
> minSessionTimeout set to -1
> 2013-09-03 10:52:24,413 [myid:] - INFO  [main:ZooKeeperServer@744] -
> maxSessionTimeout set to -1
> 2013-09-03 10:52:24,436 [myid:] - INFO  [main:NIOServerCnxnFactory@99] -
> binding to port 0.0.0.0/0.0.0.0:2181
> 2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] -
> Unexpected exception, exiting abnormally
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind(Native Method)
>  at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
>  at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
>
> The process listening on port 2181 is 2892, which turns out to be HBase.
>
> [root@localhost giraph]# fuser 2181/tcp
> 2181/tcp:             2892
> [root@localhost giraph]# ps aux | grep 2892
> hbase     2892  0.1  3.2 719592 119624 ?       Sl   Aug29   7:35
> /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx500m
> -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase
> -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log
> -Dhbase.home.dir=/usr/lib/hbase/bin/..
> ......
>
> So I am not sure what my zookeeper client is connecting to.
> It seems to be connecting to a zookeeper server but when I do 'ps' I
> cannot see
> a zookeeper server running.
> Here is my zoo.cfg file,
>
> maxClientCnxns=50
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/var/lib/zookeeper
> # the port at which the clients will connect
> clientPort=2181
> server.1=localhost:2888:3888
>
>     Thanks for any help,
>
> Ken
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message