Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DCDC10CEB for ; Thu, 5 Sep 2013 14:51:40 +0000 (UTC) Received: (qmail 60620 invoked by uid 500); 5 Sep 2013 14:51:39 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 60339 invoked by uid 500); 5 Sep 2013 14:51:31 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 60071 invoked by uid 99); 5 Sep 2013 14:51:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Sep 2013 14:51:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of claudio.martella@gmail.com designates 209.85.212.49 as permitted sender) Received: from [209.85.212.49] (HELO mail-vb0-f49.google.com) (209.85.212.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Sep 2013 14:51:24 +0000 Received: by mail-vb0-f49.google.com with SMTP id w16so1194340vbb.36 for ; Thu, 05 Sep 2013 07:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=1WKesLjJykLC5zXmX1L/5OxmUV1Xxly1kdJCNnTV87w=; b=P0o9VQD8XTRlaYrsetr3srTpvT+07x0FvHtmAvkLifG3KoB8FMLCtxfacutWZQBhHB 3l3QMIiuRWxCy1E925pjZiaj1ZNIhakXzvitNi5qC+7/JJTjtD6y/bBTtbPLpo0oumvb 03Tz4+eiNuZkZ2Es2+Ld6lxcvgB5fgkt4yHnPPg3B8OLq2uDiRjQfsdYsytJEUA4qmT4 9QzbhT4WcyMuDhJeSQZCh/cU03yDULub8wJQKVSAliEDwh9EQR4yNn77T8/usg3eeVbO zabsBsZbhEEOzrfoPVhniDdLf9gBMqeBLvg62B3Xrz5E1KhkeVt/hcdHS31TH36u1YAn +hag== X-Received: by 10.52.117.68 with SMTP id kc4mr2812330vdb.0.1378392663936; Thu, 05 Sep 2013 07:51:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.141.206 with HTTP; Thu, 5 Sep 2013 07:50:43 -0700 (PDT) In-Reply-To: References: From: Claudio Martella Date: Thu, 5 Sep 2013 16:50:43 +0200 Message-ID: Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. To: "user@giraph.apache.org" Content-Type: multipart/alternative; boundary=bcaec54866721854d104e5a40d02 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54866721854d104e5a40d02 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable great, i need to get a review soon to get the patch in the codebase. On Thu, Sep 5, 2013 at 2:10 PM, Ken Williams wrote: > Hi Claudio, > > The patch worked !! :-) > > Just to be clear, > I am running Giraph (1.0.0), not git cloned. > and hadoop 2.0.0-cdh4.1.1 > > I applied your patch and rebuilt the giraph source code with > this command, > mvn -Phadoop_2.0.0 clean compile package test > install verify > > This built correctly, with no exceptions and no tests failed. > > I then ran the giraph example, which ran successfully with this command > > [root@localhost giraph]# hadoop jar > /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop= -2.0.0- > alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner > org.apache.giraph.examples.SimpleShortestPathsVertex -vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat > -vip /user/root/input/tiny_graph.txt -of > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > /user/root/output/shortestpaths -w 1 > > I then deleted the output > hadoop fs -rm -R /user/root/output/shortestpaths > > I then restarted my HBase daemons, and ran the giraph example again, and > it worked successfully again, > no errors, no exceptions, no tasks failed, and output produced correctly. > > Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on > port 22181. > > Thank you very much for your help :-) > > Ken > > > ------------------------------ > From: claudio.martella@gmail.com > Date: Wed, 4 Sep 2013 19:21:37 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing > zookeeper is not used through the giraph.zkServerList parameter, with its > own configuration listening on port 22181. > > > On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams wrote: > > Hmmmmmmmm. Interesting. > > Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ? > > The only version of ZooKeeper I have installed is the one that came with > HBase, > and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies > clientPort=3D2181 > This is the only zoo.cfg file on my machine. > > > [root@localhost]# cat /etc/zookeeper/conf/zoo.cfg > .... > maxClientCnxns=3D50 > # The number of milliseconds of each tick > tickTime=3D2000 > # The number of ticks that the initial > # synchronization phase can take > initLimit=3D10 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=3D5 > # the directory where the snapshot is stored. > dataDir=3D/var/lib/zookeeper > # the port at which the clients will connect > clientPort=3D2181 > server.1=3Dlocalhost:2888:3888 > [root@localhost Downloads]# > > > > ------------------------------ > From: claudio.martella@gmail.com > Date: Wed, 4 Sep 2013 12:13:50 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > That should in principle not be the case, as the zookeeper started by > Giraph listens on a different port than the default. See > parameter giraph.zkServerPort, which defaults to 22181. > > > On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams wrote= : > > Hi Claudio, > > I think I have fixed the problem. > > HBase runs with its own copy of ZooKeeper which listens on port 2181. > So, when I tried to start ZooKeeper for Giraph it also tried to listen > on port 2181 > and found it was already in use, and then it terminated - which is why > Giraph failed. > If I stop the HBase daemons (including its copy of ZooKeeper) then > Giraph runs fine. > > Essentially there is a conflict between running ZooKeeper for Giraph, > if there is > already ZooKeeper running for HBase. > > I will try the patch and get back to you. > > Thanks for all your help, > > Ken > > ------------------------------ > From: claudio.martella@gmail.com > Date: Tue, 3 Sep 2013 17:01:01 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > try with the attached patch applied to trunk, without the mentioned -D > giraph.zkManagerDirectory. > > > On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams wrote: > > Hi Claudio, > > I tried this but it made no difference. The map tasks still fail, > still no output, and still an > exception in the log files - FileNotFoundException: File > /tmp/giraph/_zkServer does not exist. > > [root@localhost giraph]# hadoop jar > /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop= -2.0.0-alpha-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > -Dgiraph.zkManagerDirectory=3D'/tmp/giraph/' > org.apache.giraph.examples.SimpleShortestPathsVertex -vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat > -vip /user/root/input/tiny_graph.txt -of > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > /user/root/output/shortestpaths -w 1 > 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format > specified. Ensure your InputFormat does not require one. > 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format > vertex index type is not known > 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format > vertex value type is not known > 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format > edge value type is not known > 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disable= d > (default), do not allow any task retries (setting mapred.map.max.attempts= =3D > 0, old value =3D 4) > 13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_00= 39 > 13/09/03 14:20:02 INFO mapred.JobClient: map 0% reduce 0% > 13/09/03 14:20:12 INFO mapred.JobClient: Job complete: > job_201308291126_0039 > 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6 > 13/09/03 14:20:12 INFO mapred.JobClient: Job Counters > 13/09/03 14:20:12 INFO mapred.JobClient: Failed map tasks=3D1 > 13/09/03 14:20:12 INFO mapred.JobClient: Launched map tasks=3D2 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps > in occupied slots (ms)=3D16327 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all > reduces in occupied slots (ms)=3D0 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=3D0 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=3D0 > [root@localhost giraph]# > > > When I try to run Zookeeper it still gives me an 'Address already in use' > exception. > > [root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh > start-foreground > JMX enabled by default > Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg > 2013-09-03 14:23:37,882 [myid:] - INFO [main:QuorumPeerConfig@101] - > Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg > 2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - > Invalid configuration, only one server specified (ignoring) > 2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 0 > 2013-09-03 14:23:37,890 [myid:] - INFO [main:DatadirCleanupManager@101] > - Purge task is not scheduled. > 2013-09-03 14:23:37,890 [myid:] - WARN [main:QuorumPeerMain@118] - > Either no config or no quorum defined in config, running in standalone m= ode > 2013-09-03 14:23:37,904 [myid:] - INFO [main:QuorumPeerConfig@101] - > Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg > 2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - > Invalid configuration, only one server specified (ignoring) > 2013-09-03 14:23:37,905 [myid:] - INFO [main:ZooKeeperServerMain@100] - > Starting server > 2013-09-03 14:23:37,920 [myid:] - INFO [main:Environment@100] - Server > environment:zookeeper.version=3D3.4.3-cdh4.1.1--1, built on 10/16/2012 17= :34 > GMT > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:host.name=3Dlocalhost.localdomain > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.version=3D1.6.0_31 > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.vendor=3DSun Microsystems Inc. > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.home=3D/usr/java/jdk1.6.0_31/jre > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.class.path=3D/usr/lib/zookeeper/bin/../build/classes:/us= r/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-= log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/li= b/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/= log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zo= okeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/j= ava/lib/*.jar:/usr/lib/zookeeper/bin/../conf: > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:java.library.path=3D/usr/java/jdk1.6.0_31/jre/lib/i386/client= :/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386:/= usr/java/packages/lib/i386:/lib:/usr/lib > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:java.io.tmpdir=3D/tmp > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:java.compiler=3D > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:os.name=3DLinux > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:os.arch=3Di386 > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:os.version=3D2.6.32-279.14.1.el6.i686 > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:user.name=3Droot > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:user.home=3D/root > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:user.dir=3D/usr/local/giraph-1.0.0 > 2013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@726] - > tickTime set to 2000 > 2013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@735] - > minSessionTimeout set to -1 > 2013-09-03 14:23:37,935 [myid:] - INFO [main:ZooKeeperServer@744] - > maxSessionTimeout set to -1 > 2013-09-03 14:23:37,970 [myid:] - INFO [main:NIOServerCnxnFactory@99] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2013-09-03 14:23:37,972 [myid:] - ERROR [main:ZooKeeperServerMain@68] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnF= actory.java:100) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperSe= rverMain.java:115) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeepe= rServerMain.java:91) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.= java:53) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(Quorum= PeerMain.java:121) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.jav= a:79) > [root@localhost giraph]# > > > Thank you for any help, > > Ken > > > > > ------------------------------ > From: claudio.martella@gmail.com > Date: Tue, 3 Sep 2013 12:43:59 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > > can you try defining the zookeeper manager directory from the command > line? like this -D giraph.zkManagerDirectory=3D/path/in/hdfs/foobar > > you'll have to delete this directory by hand before each job. Just to see > if it solves the problem. Then I could know how to fix it. > > > On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams wrote= : > > Hi Pradeep, > > Yes, the zookeeper server is definitely running, I can connect to it with > the > command-line client > > [root@localhost giraph]# zkCli.sh -server 127.0.0.1:2181 > Connecting to 127.0.0.1:2181 > 2013-09-03 11:15:45,987 [myid:] - INFO [main:Environment@100] - Client > environment:zookeeper.version=3D3.4.3-cdh4.1.1--1, built on 10/16/2012 17= :34 > GMT > 2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client > environment:host.name=3Dlocalhost.localdomain > 2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client > environment:java.version=3D1.6.0_31 > ...... > WatchedEvent state:SyncConnected type:None path:null > [zk: 127.0.0.1:2181(CONNECTED) 0] ls / > [hbase, zookeeper] > [zk: 127.0.0.1:2181(CONNECTED) 1] > > > However, I am a bit confused. > If I look in the zookeeper log-file I see this port 2181 'Address already > in use' error, > > 2013-09-03 10:52:24,412 [myid:] - INFO [main:ZooKeeperServer@735] - > minSessionTimeout set to -1 > 2013-09-03 10:52:24,413 [myid:] - INFO [main:ZooKeeperServer@744] - > maxSessionTimeout set to -1 > 2013-09-03 10:52:24,436 [myid:] - INFO [main:NIOServerCnxnFactory@99] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnF= actory.java:100) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperSe= rverMain.java:115) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeepe= rServerMain.java:91) > > The process listening on port 2181 is 2892, which turns out to be HBase. > > [root@localhost giraph]# fuser 2181/tcp > 2181/tcp: 2892 > [root@localhost giraph]# ps aux | grep 2892 > hbase 2892 0.1 3.2 719592 119624 ? Sl Aug29 7:35 > /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=3Dkill -9 %p -Xmx50= 0m > -XX:+UseConcMarkSweepGC -Dhbase.log.dir=3D/var/log/hbase > -Dhbase.log.file=3Dhbase-hbase-master-localhost.localdomain.log > -Dhbase.home.dir=3D/usr/lib/hbase/bin/.. > ...... > > So I am not sure what my zookeeper client is connecting to. > It seems to be connecting to a zookeeper server but when I do 'ps' I > cannot see > a zookeeper server running. > Here is my zoo.cfg file, > > maxClientCnxns=3D50 > # The number of milliseconds of each tick > tickTime=3D2000 > # The number of ticks that the initial synchronization phase can take > initLimit=3D10 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=3D5 > # the directory where the snapshot is stored. > dataDir=3D/var/lib/zookeeper > # the port at which the clients will connect > clientPort=3D2181 > server.1=3Dlocalhost:2888:3888 > > Thanks for any help, > > Ken > > > > -- > Claudio Martella > claudio.martella@gmail.com > > > > > -- > Claudio Martella > claudio.martella@gmail.com > > > > > -- > Claudio Martella > claudio.martella@gmail.com > > > > > -- > Claudio Martella > claudio.martella@gmail.com > --=20 Claudio Martella claudio.martella@gmail.com --bcaec54866721854d104e5a40d02 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
great, i need to get a review soon to get the patch in the= codebase.


On Thu, Sep 5, 2013 at 2:10 PM, Ken Williams <zoo9000@hotmail.com= > wrote:
Hi Claudio,

The patch worked !! =A0:-)

<= div>Just to be clear,
=A0 =A0 =A0 =A0 I am running Giraph (1.0.0)= , not git cloned.
=A0 =A0 =A0 =A0 =A0and hadoop 2.0.0-cdh4.1.1

= I applied your patch and rebuilt the giraph source code with
=A0t= his command,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mvn -= Phadoop_2.0.0 clean compile package test install verify
=A0 =A0 =A0 =A0 =A0 =A0
This built correctly, with no except= ions and no tests failed. =A0=A0

I then ran the gi= raph example, which ran successfully with this command

<= /div>
[root@localhost giraph]# hadoop jar /usr/local/giraph/giraph-exam= ples/target/giraph-examples-1.0.0-for-hadoop-2.0.0- alpha-jar-with-dependen= cies.jar =A0org.apache.giraph.GiraphRunner org.apache.giraph.examples.Simpl= eShortestPathsVertex =A0-vif org.apache.giraph.io.formats.JsonLongDoubleFlo= atDoubleVertexInputFormat =A0-vip /user/root/input/tiny_graph.txt =A0 -of o= rg.apache.giraph.io.formats.IdWithValueTextOutputFormat =A0 -op /user/root/= output/shortestpaths -w 1

I then deleted the output
=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 hadoop fs -rm -R =A0/use= r/root/output/shortestpaths

I then restarte= d my HBase daemons, and ran the giraph example again, and it worked success= fully again,
no errors, no exceptions, no tasks failed, and output produced correct= ly.

Using 'netstat -an | grep 22181' I can= see that ZooKeeper is listening on port 22181.

=A0 =A0 =A0Thank you very much for your help =A0:-)

Ken



From: claudio.martella@gmail.com
Date: W= ed, 4 Sep 2013 19:21:37 +0200

Subject: Re: FileNotFoundException: File _bsp/_defaul= tZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: user@giraph.apache.or= g

Giraph is shipped with Zookeeper 3.3.3, and it is run,= if an existing zookeeper is not used through the giraph.zkServerList param= eter, with its own configuration listening on port 22181.


On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams <zoo9000@hotma= il.com> wrote:
Hmmmmmmmm. Intere= sting.

Is Giraph (1.0.0) supposed to come w= ith its own version of ZooKeeper ?

The only versio= n of ZooKeeper I have installed is the one that came with HBase,
and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies clie= ntPort=3D2181
This is the only zoo.cfg file on my machine.
<= div>


[root@localhost]# cat /etc= /zookeeper/conf/zoo.cfg=A0
....
maxClientCn= xns=3D50
# The number of milliseconds of each tick
tickTime=3D2000
# The number of ticks that the initial=A0
# synchronization phase can take
initLimit=3D1= 0
# The number of ticks that can pass between=A0
# sending a request and getting an acknowledgement
syncLimit=3D5<= /div>
# the directory where the snapshot is stored.
dataDir= =3D/var/lib/zookeeper
# the port at which the clients will connec= t
clientPort=3D2181
server.1=3Dlocalhost:2888:3888
=
[root@localhost Downloads]#=A0




From: claudio.martella@gmail.com
Date: Wed, 4 Sep 2013 12:13:50 +0200

Subject: Re: FileNotFound= Exception: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer d= oes not exist.
To: user@giraph.apache.org

That should in principle not be the case, as the zooke= eper started by Giraph listens on a different port than the default. See pa= rameter=A0giraph.zkServerPort, which defaults to 22181.


On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams <zoo9000@hotm= ail.com> wrote:
Hi Claudio,

=A0 =A0 I t= hink I have fixed the problem.

=A0 =A0HBase runs w= ith its own copy of ZooKeeper which listens on port 2181.
=A0 =A0= So, when I tried to start ZooKeeper for Giraph it also tried to listen on p= ort 2181
=A0 =A0and found it was already in use, and then it terminated - which= is why Giraph failed.
=A0 =A0If I stop the HBase daemons (includ= ing its copy of ZooKeeper) then Giraph runs fine.=A0

=A0 =A0Essentially there is a conflict between running ZooKeeper for Gir= aph, if there is=A0
=A0 =A0already ZooKeeper running for HBase.=A0

=A0 =A0I will try the patch and get back to you.

=A0 =A0Thanks for all your help,

Ken

From: c= laudio.martella@gmail.com
Date: Tue, 3 Sep 2013 17:01:01 +0200
<= div>
Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/= job_201308291126_0029/_zkServer does not exist.
To: user@giraph= .apache.org

try with the attached patch applied= to trunk, without the mentioned -D giraph.zkManagerDirectory.

On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams <= ;zoo9000@hotmail.c= om> wrote:
Hi Claudio,

=A0 =A0 I t= ried this but it made no difference. The map tasks still fail, still no out= put, and still an
exception in the log files -=A0FileNotFoundExce= ption: File /tmp/giraph/_zkServer does not exist.

[root@localhost giraph]# hadoop jar /usr/local/gir= aph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar= -with-dependencies.jar =A0 org.apache.giraph.GiraphRunner =A0-Dgiraph.zkMan= agerDirectory=3D'/tmp/giraph/' =A0 =A0 org.apache.giraph.examples.S= impleShortestPathsVertex =A0-vif org.apache.giraph.io.formats.JsonLongDoubl= eFloatDoubleVertexInputFormat -vip /user/root/input/tiny_graph.txt -of org.= apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output/= shortestpaths -w 1=A0
13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format = specified. Ensure your InputFormat does not require one.
13/09/03= 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex index= type is not known
13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format= vertex value type is not known
13/09/03 14:19:58 WARN job.Giraph= ConfigurationValidator: Output format edge value type is not known
13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disa= bled (default), do not allow any task retries (setting mapred.map.max.attem= pts =3D 0, old value =3D 4)
13/09/03 14:19:58 WARN mapred.JobClie= nt: Use GenericOptionsParser for parsing the arguments. Applications should= implement Tool for the same.
13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126= _0039
13/09/03 14:20:02 INFO mapred.JobClient: =A0map 0% reduce 0= %
13/09/03 14:20:12 INFO mapred.JobClient: Job complete: job_2013= 08291126_0039
13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6
13/09/0= 3 14:20:12 INFO mapred.JobClient: =A0 Job Counters=A0
13/09/03 14= :20:12 INFO mapred.JobClient: =A0 =A0 Failed map tasks=3D1
13/09/= 03 14:20:12 INFO mapred.JobClient: =A0 =A0 Launched map tasks=3D2
13/09/03 14:20:12 INFO mapred.JobClient: =A0 =A0 Total time spent by a= ll maps in occupied slots (ms)=3D16327
13/09/03 14:20:12 INFO map= red.JobClient: =A0 =A0 Total time spent by all reduces in occupied slots (m= s)=3D0
13/09/03 14:20:12 INFO mapred.JobClient: =A0 =A0 Total time spent by a= ll maps waiting after reserving slots (ms)=3D0
13/09/03 14:20:12 = INFO mapred.JobClient: =A0 =A0 Total time spent by all reduces waiting afte= r reserving slots (ms)=3D0
[root@localhost giraph]#=A0


<= div>When I try to run Zookeeper it still gives me an 'Address already i= n use' exception.

[root@localhost giraph]= # /usr/lib/zookeeper/bin/zkServer.sh start-foreground
JMX enabled by default
Using config: /usr/lib/zookeeper= /bin/../conf/zoo.cfg
2013-09-03 14:23:37,882 [myid:] - INFO= =A0[main:QuorumPeerConfig@101] - Reading configuration from: /usr/lib/zook= eeper/bin/../conf/zoo.cfg
2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - = Invalid configuration, only one server specified (ignoring)
2013-= 09-03 14:23:37,889 [myid:] - INFO =A0[main:DatadirCleanupManager@78] - auto= purge.snapRetainCount set to 3
2013-09-03 14:23:37,889 [myid:] - INFO =A0[main:DatadirCleanupManager@= 79] - autopurge.purgeInterval set to 0
2013-09-03 14:23:37,890 [m= yid:] - INFO =A0[main:DatadirCleanupManager@101] - Purge task is not schedu= led.
2013-09-03 14:23:37,890 [myid:] - WARN =A0[main:QuorumPeerMain@118] - = Either no config or no quorum defined in config, running =A0in standalone m= ode
2013-09-03 14:23:37,904 [myid:] - INFO =A0[main:QuorumPeerCon= fig@101] - Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.c= fg
2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - = Invalid configuration, only one server specified (ignoring)
2013-= 09-03 14:23:37,905 [myid:] - INFO =A0[main:ZooKeeperServerMain@100] - Start= ing server
2013-09-03 14:23:37,920 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:zookeeper.version=3D3.4.3-cdh4.1.1--1, built on 10/16/2012 = 17:34 GMT
2013-09-03 14:23:37,921 [myid:] - INFO =A0[main:Environ= ment@100] - Server environment:host.name=3Dlocalhost.localdomain
2013-09-03 14:23:37,921 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:java.version=3D1.6.0_31
2013-09-03 14:23:37,921 [= myid:] - INFO =A0[main:Environment@100] - Server environment:java.vendor=3D= Sun Microsystems Inc.
2013-09-03 14:23:37,921 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:java.home=3D/usr/java/jdk1.6.0_31/jre
2013-09-03 = 14:23:37,921 [myid:] - INFO =A0[main:Environment@100] - Server environment:= java.class.path=3D/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeepe= r/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.= jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bi= n/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.j= ar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/..= /zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:= /usr/lib/zookeeper/bin/../conf:
2013-09-03 14:23:37,922 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:java.library.path=3D/usr/java/jdk1.6.0_31/jre/lib/i386/clie= nt:/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386= :/usr/java/packages/lib/i386:/lib:/usr/lib
2013-09-03 14:23:37,922 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:java.io.tmpdir=3D/tmp
2013-09-03 14:23:37,922 [my= id:] - INFO =A0[main:Environment@100] - Server environment:java.compiler=3D= <NA>
2013-09-03 14:23:37,922 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:os.name=3D= Linux
2013-09-03 14:23:37,922 [myid:] - INFO =A0[main:Environment= @100] - Server environment:os.arch=3Di386
2013-09-03 14:23:37,923 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:os.version=3D2.6.32-279.14.1.el6.i686
2013-09-03 = 14:23:37,923 [myid:] - INFO =A0[main:Environment@100] - Server environment:= user.name=3Droot
2013-09-03 14:23:37,923 [myid:] - INFO =A0[main:Environment@100] - Ser= ver environment:user.home=3D/root
2013-09-03 14:23:37,923 [myid:]= - INFO =A0[main:Environment@100] - Server environment:user.dir=3D/usr/loca= l/giraph-1.0.0
2013-09-03 14:23:37,934 [myid:] - INFO =A0[main:ZooKeeperServer@726] -= tickTime set to 2000
2013-09-03 14:23:37,934 [myid:] - INFO =A0[= main:ZooKeeperServer@735] - minSessionTimeout set to -1
2013-09-0= 3 14:23:37,935 [myid:] - INFO =A0[main:ZooKeeperServer@744] - maxSessionTim= eout set to -1
2013-09-03 14:23:37,970 [myid:] - INFO =A0[main:NIOServerCnxnFactory@9= 9] - binding to port 0.0.0.0/0.0.0.0:2181
2013-09-03 14:23:37,972 [myid:] - ER= ROR [main:ZooKeeperServerMain@68] - Unexpected exception, exiting abnormall= y
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)<= /div>
at sun.nio.ch.Server= SocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocke= tAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketA= daptor.java:52)
at org.apache.zookeeper.s= erver.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
at org.apache.zookeeper.ser= ver.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
at org.apache.zookeeper.s= erver.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
at org.apache.zoo= keeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.s= erver.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121)
=
at org.apache.zookeeper.s= erver.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
[root@localhost giraph]#=A0


<= div>=A0 =A0 =A0 Thank you for any help,

Ken
<= div>




From: claudio.martella@gmail.com
Date: Tue, 3 Sep 2013 12:43:59 +0200

Subject: Re: FileNotFoundExcep= tion: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does n= ot exist.
To: user@giraph.apache.org


can you try defining the zookeeper manager di= rectory from the command line? like this -D=A0giraph.zkManagerDirectory=3D/= path/in/hdfs/foobar

you'll have to delete this direc= tory by hand before each job. Just to see if it solves the problem. Then I = could know how to fix it.


On Tue, Sep 3, 2013 at 12:32 = PM, Ken Williams <zoo9000@hotmail.com> wrote:
Hi Pradeep,

Yes, the zo= okeeper server is definitely running, I can connect to it with the=A0
=
command-line client
=A0 =A0=A0
[root@localhos= t giraph]# zkCli.sh =A0-server 127.0.0.1:2181
Connecting to 127.= 0.0.1:2181
2013-09-03 11:15:45,987 [myid:] - INFO =A0[main:En= vironment@100] - Client environment:zookeeper.version=3D3.4.3-cdh4.1.1--1, = built on 10/16/2012 17:34 GMT
2013-09-03 11:15:45,990 [myid:] - INFO =A0[main:Environment@100] - Cli= ent environment:host.name=3Dlocalhost.localdomain


However, I am a bit confused.=A0
If= I look in the zookeeper log-file I see this port 2181 'Address already= in use' error,

2013-09-03 10:52:24,412 [= myid:] - INFO =A0[main:ZooKeeperServer@735] - minSessionTimeout set to -1
2013-09-03 10:52:24,413 [myid:] - INFO =A0[main:ZooKeeperServer@744] -= maxSessionTimeout set to -1
2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] = - Unexpected exception, exiting abnormally
java.net.BindException= : Address already in use
at sun.nio.ch.Net.bind(Native Method)
at sun.nio.ch.ServerSocke= tChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocketAdaptor.bind(Se= rverSocketAdaptor.java:59)
at sun.nio.ch.ServerSocke= tAdaptor.bind(ServerSocketAdaptor.java:52)
at org.apache.zookeeper.server.NIOServerCnxnFactory.= configure(NIOServerCnxnFactory.java:100)
at org.apache.zookeeper.s= erver.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
=
at org.apache.zookeeper.s= erver.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)

The process listening on port 2181 is 2892, which= turns out to be HBase.=A0

[root@localhost giraph]= # fuser 2181/tcp
2181/tcp: =A0 =A0 =A0 =A0 =A0 =A0 2892
[root@localhost giraph]# ps aux | grep 2892
hbase =A0 =A0 2892 =A00.1 =A03.2 719592= 119624 ? =A0 =A0 =A0 Sl =A0 Aug29 =A0 7:35 /usr/java/jdk1.6.0_31/bin/java = -XX:OnOutOfMemoryError=3Dkill -9 %p -Xmx500m -XX:+UseConcMarkSweepGC -Dhbas= e.log.dir=3D/var/log/hbase -Dhbase.log.file=3Dhbase-hbase-master-localhost.= localdomain.log -Dhbase.home.dir=3D/usr/lib/hbase/bin/..=A0 =A0
......

So I am not sure what my zookeeper cli= ent is connecting to. =A0 =A0=A0
It seems to be connecting to a z= ookeeper server but when I do 'ps' I cannot see=A0
a zook= eeper server running.=A0
Here is my zoo.cfg file,

maxClientCnxns= =3D50
# The number of milliseconds of each tick
tickTim= e=3D2000
# The number of ticks that the initial=A0synchronization phase can take
initLimit=3D10
# The number of ticks that can pass between= =A0
# sending a request and getting an acknowledgement
= syncLimit=3D5
# the directory where the snapshot is stored.
=
dataDir=3D/var/lib/zookeeper
# the port at which the clients will= connect
clientPort=3D2181
server.1=3Dlocalhost:2888:38= 88

=A0 =A0 Thanks for any help,
Ken



<= /div>--
=A0 =A0Claudio Martella
=A0 = =A0claudio.= martella@gmail.com=A0 =A0



--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0



--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0


--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0


--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0 --bcaec54866721854d104e5a40d02--