Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0BF6933A for ; Tue, 12 Mar 2013 17:37:20 +0000 (UTC) Received: (qmail 34681 invoked by uid 500); 12 Mar 2013 17:27:06 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 12305 invoked by uid 500); 12 Mar 2013 17:26:24 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 34933 invoked by uid 99); 12 Mar 2013 16:56:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Mar 2013 16:56:56 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ameetkini@gmail.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Mar 2013 16:56:49 +0000 Received: by mail-lb0-f179.google.com with SMTP id j14so156381lbo.24 for ; Tue, 12 Mar 2013 09:56:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=+p8CTFyAZl8E3cONzaelMYm90Do7GfxbmyshmBLCOCY=; b=LSMf/VsR5EaNwRTIn5kGxkkKl4LExW9dVEFey0ECcfXv8kmuMtUwLsP7MP8FBCLhGR og6fRi/cTCeOQB520RYVmRG0Ri5ogKxyWbXcqFn2I5jaN4rBKpClpqfOCfVEBNYOxBDt HLCN+i5/GJESwBdQnL+oOL3sNCjRktlkLLR/uG4s5GLs8rSqjs47VtFaVd68Yj5QCzuQ oPHBL4NAsO/D/YVVrjuqEWsA70ju3QPdEGwN5l1mENV2hSgKu/oXpO5nofWrOYskdleA JvapSOCuKFlxV0p6A7E7EeQUzZOQ0HR5DAhCNs/xJDEIP9O+XYy4P9ehe0s+xn4JFwB8 rX6g== X-Received: by 10.112.82.164 with SMTP id j4mr6464583lby.112.1363107388819; Tue, 12 Mar 2013 09:56:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.154.8 with HTTP; Tue, 12 Mar 2013 09:56:08 -0700 (PDT) In-Reply-To: References: <513B6DE9.3020906@apache.org> <513BFB14.2020101@apache.org> From: Ameet Kini Date: Tue, 12 Mar 2013 12:56:08 -0400 Message-ID: Subject: Re: Zookeeper exception while running SimpleShortestPathsVertexTest To: user@giraph.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org I looked at this some more today, and turns out my original problem was caused by something else and not the ZookeeperException, as Avery suggested. Turns out my giraph project was built with hadoop 0.20.203 and I was using 0.20.2, which does not include the org.apache.hadoop.security module. Rebuilding giraph with -Phadoop_non_secure as recommended by README fixed that problem but now I have a new one. The "Input path does not exist" below is caused by the unit test looking for the directory containing "vertices.txt" in HDFS and that file doesn't exist there. The parent directory hdfs://localhost:9000/tmp/giraph-SimpleShortestPathsVertex-1988055955408768000 does exist, but doesn't contain vertices.txt. What is more confusing is that the local directory file:///tmp/giraph-SimpleShortestPathsVertex-1988055955408768000/ exists but also doesn't contain vertices.txt. I initially thought that the unit test maintains all its data in the local file system, but looks like its missing vertices.txt, and I'm also unsure as to why TextVertexInputFormat looks for that directory in HDFS versus the local file system. ... 8 more 2013-03-12 12:33:42,386 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: karadi/127.0.1.1:30010 with up to 16 threads on bind attempt 1 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1 2013-03-12 12:33:42,386 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: karadi/127.0.1.1:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1 2013-03-12 12:33:42,391 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder. 2013-03-12 12:33:42,391 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder. 2013-03-12 12:33:42,397 INFO org.apache.giraph.graph.BspServiceMaster: becomeMaster: I am now the master! 2013-03-12 12:33:42,404 INFO org.apache.giraph.graph.GraphMapper: setup: Registering health of this worker... 2013-03-12 12:33:42,412 INFO org.apache.giraph.graph.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201303121015_0006/_masterJobState) 2013-03-12 12:33:42,428 INFO org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir already exists! 2013-03-12 12:33:42,481 INFO org.apache.giraph.graph.BspService: process: applicationAttemptChanged signaled 2013-03-12 12:33:42,481 INFO org.apache.giraph.graph.BspService: process: applicationAttemptChanged signaled 2013-03-12 12:33:42,528 INFO org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir already exists! 2013-03-12 12:33:42,612 WARN org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected) 2013-03-12 12:33:42,612 WARN org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected) 2013-03-12 12:33:42,670 INFO org.apache.giraph.graph.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/karadi_0 and workerInfo= Worker(hostname=karadi, MRtaskID=0, port=30010) 2013-03-12 12:33:42,702 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.graph.MasterThread, msg = generateVertexInputSplits: Got IOException, exiting... java.lang.IllegalStateException: generateVertexInputSplits: Got IOException at org.apache.giraph.graph.BspServiceMaster.generateInputSplits(BspServiceMaster.java:268) at org.apache.giraph.graph.BspServiceMaster.createInputSplits(BspServiceMaster.java:568) at org.apache.giraph.graph.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:633) at org.apache.giraph.graph.MasterThread.run(MasterThread.java:102) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/tmp/giraph-SimpleShortestPathsVertex-1988055955408768000/vertices.txt at org.apache.giraph.io.GiraphFileInputFormat.listStatus(GiraphFileInputFormat.java:232) at org.apache.giraph.io.GiraphFileInputFormat.listVertexStatus(GiraphFileInputFormat.java:247) at org.apache.giraph.io.GiraphFileInputFormat.getVertexSplits(GiraphFileInputFormat.java:318) at org.apache.giraph.io.TextVertexInputFormat.getSplits(TextVertexInputFormat.java:61) at org.apache.giraph.graph.BspServiceMaster.generateInputSplits(BspServiceMaster.java:266) ... 3 more On Mon, Mar 11, 2013 at 3:07 PM, Ameet Kini wrote: > Following up on my below note, here's what I see as far as the > zookeeper directory in /tmp. I don't know where else to look for > possible errors. Any pointers ? > > akini@karadi:~$ ls -tal > /tmp/giraph-SimpleShortestPathsVertex-3594578381801720832/ > total 36 > drwxrwxr-x 3 akini akini 4096 Mar 11 15:01 . > drwxrwxr-x 3 akini akini 4096 Mar 11 15:00 _bspZooKeeper > drwxrwxrwt 29 root root 24576 Mar 11 15:00 .. > akini@karadi:~$ ls -tal > /tmp/giraph-SimpleShortestPathsVertex-3594578381801720832/_bspZooKeeper/ > total 12 > drwxrwxr-x 3 akini akini 4096 Mar 11 15:01 .. > drwxrwxr-x 2 akini akini 4096 Mar 11 15:00 version-2 > drwxrwxr-x 3 akini akini 4096 Mar 11 15:00 . > akini@karadi:~$ ls -tal > /tmp/giraph-SimpleShortestPathsVertex-3594578381801720832/_bspZooKeeper/version-2/ > total 20 > -rw-rw-r-- 1 akini akini 67108880 Mar 11 15:00 log.1 > drwxrwxr-x 2 akini akini 4096 Mar 11 15:00 . > -rw-rw-r-- 1 akini akini 296 Mar 11 15:00 snapshot.0 > drwxrwxr-x 3 akini akini 4096 Mar 11 15:00 .. > > > > > > On Sun, Mar 10, 2013 at 11:49 AM, Ameet Kini wrote: >> I'm launching the unit test from inside Eclipse. There are no other >> errors on the Eclipse console, so I assumed it is the KeeperException >> that is causing the test hang. But there may be errors in some log >> file somewhere which I am missing?? Not sure where to look in this >> case. >> >> I don't have the environment in front of me right now but I did verify >> that zookeeper is indeed brought up, at least, attempted to be brought >> up. This I verified by digging into the unit test code and getting the >> path to the zookeeper's directory and verifying that it contains >> non-zero sized files. They were binary files so can't get any more >> info from them. >> >> If I should be looking at some other log file for possible errors, >> would greatly appreciate a pointer. >> >> Thanks, >> Ameet >> >> >> On Sat, Mar 9, 2013 at 10:16 PM, Avery Ching wrote: >>> I'm guessing there is another error. How are you launching it? >>> >>> >>> On 3/9/13 12:42 PM, Ameet Kini wrote: >>>> >>>> The test hangs after those messages. I assumed it is because of the >>>> KeeperException messages. >>>> >>>> Ameet >>>> >>>> On Sat, Mar 9, 2013 at 12:14 PM, Avery Ching wrote: >>>>> >>>>> I think those are info level logs rather than actual issues. If your job >>>>> completes successfully, I wouldn't worry about it. >>>>> >>>>> >>>>> On 3/8/13 12:31 PM, Ameet Kini wrote: >>>>>> >>>>>> Hi folks, >>>>>> >>>>>> I am trying to run the SimpleShortestPathsVertexTest example >>>>>> introduced by the unit testing tool as part of >>>>>> (https://issues.apache.org/jira/browse/GIRAPH-51) and see the below >>>>>> zookeeper exception while running the testToyData method. I can run >>>>>> giraph applications from the command-line and have confirmed that the >>>>>> worker node can bring up zookeeper ok. Is there a configuration step I >>>>>> am missing while running the unit test tool? >>>>>> >>>>>> Thanks, >>>>>> Ameet >>>>>> >>>>>> >>>>>> >>>>>> [14:56:25] INFO: [ZooKeeperServerMain] Starting server >>>>>> [14:56:25] INFO: [GiraphJob] run: Since checkpointing is disabled >>>>>> (default), do not allow any task retries (setting >>>>>> mapred.map.max.attempts = 0, old value = 4) >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 >>>>>> GMT >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server environment:host.name=dodo >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:java.version=1.6.0_24 >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server environment:java.vendor=Sun >>>>>> Microsystems Inc. >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> >>>>>> >>>>>> environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/a\ >>>>>> >>>>>> >>>>>> md64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib\ >>>>>> /jni:/lib:/usr/lib >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:java.io.tmpdir=/tmp >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:java.compiler= >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server environment:os.name=Linux >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server environment:os.arch=amd64 >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:os.version=3.2.0-29-generic >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server environment:user.name=akini >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:user.home=/home/akini >>>>>> [14:56:25] INFO: [ZooKeeperServer] Server >>>>>> environment:user.dir=/home/akini/workspace/giraph_test >>>>>> [14:56:25] INFO: [ZooKeeperServer] tickTime set to 2000 >>>>>> [14:56:25] INFO: [ZooKeeperServer] minSessionTimeout set to 10000 >>>>>> [14:56:25] INFO: [ZooKeeperServer] maxSessionTimeout set to 100000 >>>>>> [14:56:25] WARN: [JobClient] Use GenericOptionsParser for parsing the >>>>>> arguments. Applications should implement Tool for the same. >>>>>> [14:56:25] INFO: [NIOServerCnxn] binding to port 0.0.0.0/0.0.0.0:22182 >>>>>> [14:56:25] INFO: [FileTxnSnapLog] Snapshotting: 0 >>>>>> [14:56:25] INFO: [JobClient] Running job: job_201303070954_0007 >>>>>> [14:56:26] INFO: [JobClient] map 0% reduce 0% >>>>>> [14:56:30] INFO: [NIOServerCnxn] Accepted socket connection from >>>>>> /127.0.0.1:43076 >>>>>> [14:56:30] INFO: [NIOServerCnxn] Client attempting to establish new >>>>>> session at /127.0.0.1:43076 >>>>>> [14:56:30] INFO: [FileTxnLog] Creating new log file: log.1 >>>>>> [14:56:31] INFO: [NIOServerCnxn] Established session >>>>>> 0x13d4b936bc30000 with negotiated timeout 60000 for client >>>>>> /127.0.0.1:43076 >>>>>> [14:56:31] INFO: [NIOServerCnxn] Accepted socket connection from >>>>>> /127.0.0.1:43077 >>>>>> [14:56:31] INFO: [NIOServerCnxn] Client attempting to establish new >>>>>> session at /127.0.0.1:43077 >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30000 >>>>>> type:create cxid:0x1 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_masterElectionDir >>>>>> Error:KeeperErrorCode = NoNode for >>>>>> /_hadoopBsp/job_201303070954_0007/_masterElectionDir >>>>>> [14:56:31] INFO: [NIOServerCnxn] Established session >>>>>> 0x13d4b936bc30001 with negotiated timeout 60000 for client >>>>>> /127.0.0.1:43077 >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30001 >>>>>> type:create cxid:0x1 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_masterJobState >>>>>> Error:KeeperErrorCode = NodeExists for >>>>>> /_hadoopBsp/job_201303070954_0007/_masterJobState >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30000 >>>>>> type:create cxid:0xc zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir/0 >>>>>> Error:KeeperErrorCode = NoNode for >>>>>> /_hadoopBsp/job_201303070954_0007/_applicationA\ >>>>>> ttemptsDir/0 >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30001 >>>>>> type:create cxid:0x3 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir >>>>>> Error:KeeperErrorCode = NodeExists for >>>>>> /_hadoopBsp/job_201303070954_0007/_applicatio\ >>>>>> nAttemptsDir >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30000 >>>>>> type:create cxid:0xd zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error Path:/_hadoopBsp Error:KeeperErrorCode = >>>>>> NodeExists for /_hadoopBsp >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30000 >>>>>> type:create cxid:0xe zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007 >>>>>> Error:KeeperErrorCode = NodeExists for >>>>>> /_hadoopBsp/job_201303070954_0007 >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30001 >>>>>> type:create cxid:0x5 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir/0 >>>>>> Error:KeeperErrorCode = NoNode for >>>>>> /_hadoopBsp/job_201303070954_0007/_applicationA\ >>>>>> ttemptsDir/0 >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30000 >>>>>> type:create cxid:0xf zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir >>>>>> Error:KeeperErrorCode = NodeExists for >>>>>> /_hadoopBsp/job_201303070954_0007/_applicatio\ >>>>>> nAttemptsDir >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30001 >>>>>> type:create cxid:0x6 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error Path:/_hadoopBsp Error:KeeperErrorCode = >>>>>> NodeExists for /_hadoopBsp >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30001 >>>>>> type:create cxid:0x7 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007 >>>>>> Error:KeeperErrorCode = NodeExists for >>>>>> /_hadoopBsp/job_201303070954_0007 >>>>>> [14:56:31] INFO: [PrepRequestProcessor] Got user-level >>>>>> KeeperException when processing sessionid:0x13d4b936bc30001 >>>>>> type:create cxid:0x8 zxid:0xfffffffffffffffe txntype:un\ >>>>>> known reqpath:n/a Error >>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir >>>>>> Error:KeeperErrorCode = NodeExists for >>>>>> /_hadoopBsp/job_201303070954_0007/_application >>>>> >>>>> >>>