giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameet Kini <ameetk...@gmail.com>
Subject Re: Zookeeper exception while running SimpleShortestPathsVertexTest
Date Tue, 12 Mar 2013 16:56:08 GMT
I looked at this some more today, and turns out my original problem
was caused by something else and not the ZookeeperException, as Avery
suggested. Turns out my giraph project was built with hadoop 0.20.203
and I was using 0.20.2, which does not include the
org.apache.hadoop.security module. Rebuilding giraph with
-Phadoop_non_secure as recommended by README fixed that problem but
now I have a new one. The "Input path does not exist" below is caused
by the unit test looking for the directory containing "vertices.txt"
in HDFS and that file doesn't exist there. The parent directory
hdfs://localhost:9000/tmp/giraph-SimpleShortestPathsVertex-1988055955408768000
does exist, but doesn't contain vertices.txt. What is more confusing
is that the local directory
file:///tmp/giraph-SimpleShortestPathsVertex-1988055955408768000/
exists but also doesn't contain vertices.txt.

I initially thought that the unit test maintains all its data in the
local file system, but looks like its missing vertices.txt, and I'm
also unsure as to why TextVertexInputFormat looks for that directory
in HDFS versus the local file system.



	... 8 more
2013-03-12 12:33:42,386 INFO org.apache.giraph.comm.netty.NettyServer:
start: Started server communication server: karadi/127.0.1.1:30010
with up to 16 threads on bind attempt 1 with sendBufferSize = 32768
receiveBufferSize = 524288 backlog = 1
2013-03-12 12:33:42,386 INFO org.apache.giraph.comm.netty.NettyServer:
start: Started server communication server: karadi/127.0.1.1:30000
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768
receiveBufferSize = 524288 backlog = 1
2013-03-12 12:33:42,391 INFO org.apache.giraph.comm.netty.NettyClient:
NettyClient: Using execution handler with 8 threads after
requestEncoder.
2013-03-12 12:33:42,391 INFO org.apache.giraph.comm.netty.NettyClient:
NettyClient: Using execution handler with 8 threads after
requestEncoder.
2013-03-12 12:33:42,397 INFO org.apache.giraph.graph.BspServiceMaster:
becomeMaster: I am now the master!
2013-03-12 12:33:42,404 INFO org.apache.giraph.graph.GraphMapper:
setup: Registering health of this worker...
2013-03-12 12:33:42,412 INFO org.apache.giraph.graph.BspService:
getJobState: Job state already exists
(/_hadoopBsp/job_201303121015_0006/_masterJobState)
2013-03-12 12:33:42,428 INFO org.apache.giraph.graph.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir already
exists!
2013-03-12 12:33:42,481 INFO org.apache.giraph.graph.BspService:
process: applicationAttemptChanged signaled
2013-03-12 12:33:42,481 INFO org.apache.giraph.graph.BspService:
process: applicationAttemptChanged signaled
2013-03-12 12:33:42,528 INFO org.apache.giraph.graph.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir already
exists!
2013-03-12 12:33:42,612 WARN org.apache.giraph.graph.BspService:
process: Unknown and unprocessed event
(path=/_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
2013-03-12 12:33:42,612 WARN org.apache.giraph.graph.BspService:
process: Unknown and unprocessed event
(path=/_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
2013-03-12 12:33:42,670 INFO org.apache.giraph.graph.BspServiceWorker:
registerHealth: Created my health node for attempt=0, superstep=-1
with /_hadoopBsp/job_201303121015_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/karadi_0
and workerInfo= Worker(hostname=karadi, MRtaskID=0, port=30010)
2013-03-12 12:33:42,702 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.graph.MasterThread, msg = generateVertexInputSplits:
Got IOException, exiting...
java.lang.IllegalStateException: generateVertexInputSplits: Got IOException
	at org.apache.giraph.graph.BspServiceMaster.generateInputSplits(BspServiceMaster.java:268)
	at org.apache.giraph.graph.BspServiceMaster.createInputSplits(BspServiceMaster.java:568)
	at org.apache.giraph.graph.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:633)
	at org.apache.giraph.graph.MasterThread.run(MasterThread.java:102)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
hdfs://localhost:9000/tmp/giraph-SimpleShortestPathsVertex-1988055955408768000/vertices.txt
	at org.apache.giraph.io.GiraphFileInputFormat.listStatus(GiraphFileInputFormat.java:232)
	at org.apache.giraph.io.GiraphFileInputFormat.listVertexStatus(GiraphFileInputFormat.java:247)
	at org.apache.giraph.io.GiraphFileInputFormat.getVertexSplits(GiraphFileInputFormat.java:318)
	at org.apache.giraph.io.TextVertexInputFormat.getSplits(TextVertexInputFormat.java:61)
	at org.apache.giraph.graph.BspServiceMaster.generateInputSplits(BspServiceMaster.java:266)
	... 3 more


On Mon, Mar 11, 2013 at 3:07 PM, Ameet Kini <ameetkini@gmail.com> wrote:
> Following up on my below note, here's what I see as far as the
> zookeeper directory in /tmp. I don't know where else to look for
> possible errors. Any pointers ?
>
> akini@karadi:~$ ls -tal
> /tmp/giraph-SimpleShortestPathsVertex-3594578381801720832/
> total 36
> drwxrwxr-x  3 akini akini  4096 Mar 11 15:01 .
> drwxrwxr-x  3 akini akini  4096 Mar 11 15:00 _bspZooKeeper
> drwxrwxrwt 29 root  root  24576 Mar 11 15:00 ..
> akini@karadi:~$ ls -tal
> /tmp/giraph-SimpleShortestPathsVertex-3594578381801720832/_bspZooKeeper/
> total 12
> drwxrwxr-x 3 akini akini 4096 Mar 11 15:01 ..
> drwxrwxr-x 2 akini akini 4096 Mar 11 15:00 version-2
> drwxrwxr-x 3 akini akini 4096 Mar 11 15:00 .
> akini@karadi:~$ ls -tal
> /tmp/giraph-SimpleShortestPathsVertex-3594578381801720832/_bspZooKeeper/version-2/
> total 20
> -rw-rw-r-- 1 akini akini 67108880 Mar 11 15:00 log.1
> drwxrwxr-x 2 akini akini     4096 Mar 11 15:00 .
> -rw-rw-r-- 1 akini akini      296 Mar 11 15:00 snapshot.0
> drwxrwxr-x 3 akini akini     4096 Mar 11 15:00 ..
>
>
>
>
>
> On Sun, Mar 10, 2013 at 11:49 AM, Ameet Kini <ameetkini@gmail.com> wrote:
>> I'm launching the unit test from inside Eclipse. There are no other
>> errors on the Eclipse console, so I assumed it is the KeeperException
>> that is causing the test hang. But there may be errors in some log
>> file somewhere which I am missing?? Not sure where to look in this
>> case.
>>
>> I don't have the environment in front of me right now but I did verify
>> that zookeeper is indeed brought up, at least, attempted to be brought
>> up. This I verified by digging into the unit test code and getting the
>> path to the zookeeper's directory and verifying that it contains
>> non-zero sized files. They were binary files so can't get any more
>> info from them.
>>
>> If I should be looking at some other log file for possible errors,
>> would greatly appreciate a pointer.
>>
>> Thanks,
>> Ameet
>>
>>
>> On Sat, Mar 9, 2013 at 10:16 PM, Avery Ching <aching@apache.org> wrote:
>>> I'm guessing there is another error. How are you launching it?
>>>
>>>
>>> On 3/9/13 12:42 PM, Ameet Kini wrote:
>>>>
>>>> The test hangs after those messages. I assumed it is because of the
>>>> KeeperException messages.
>>>>
>>>> Ameet
>>>>
>>>> On Sat, Mar 9, 2013 at 12:14 PM, Avery Ching <aching@apache.org> wrote:
>>>>>
>>>>> I think those are info level logs rather than actual issues.  If your
job
>>>>> completes successfully, I wouldn't worry about it.
>>>>>
>>>>>
>>>>> On 3/8/13 12:31 PM, Ameet Kini wrote:
>>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> I am trying to run the SimpleShortestPathsVertexTest example
>>>>>> introduced by the unit testing tool as part of
>>>>>> (https://issues.apache.org/jira/browse/GIRAPH-51) and see the below
>>>>>> zookeeper exception while running the testToyData method. I can run
>>>>>> giraph applications from the command-line and have confirmed that
the
>>>>>> worker node can bring up zookeeper ok. Is there a configuration step
I
>>>>>> am missing while running the unit test tool?
>>>>>>
>>>>>> Thanks,
>>>>>> Ameet
>>>>>>
>>>>>>
>>>>>>
>>>>>> [14:56:25]  INFO: [ZooKeeperServerMain] Starting server
>>>>>> [14:56:25]  INFO: [GiraphJob] run: Since checkpointing is disabled
>>>>>> (default), do not allow any task retries (setting
>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011
22:27
>>>>>> GMT
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server environment:host.name=dodo
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:java.version=1.6.0_24
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server environment:java.vendor=Sun
>>>>>> Microsystems Inc.
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>>
>>>>>>
>>>>>> environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/a\
>>>>>>
>>>>>>
>>>>>> md64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib\
>>>>>> /jni:/lib:/usr/lib
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:java.io.tmpdir=/tmp
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:java.compiler=<NA>
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server environment:os.name=Linux
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server environment:os.arch=amd64
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:os.version=3.2.0-29-generic
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server environment:user.name=akini
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:user.home=/home/akini
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] Server
>>>>>> environment:user.dir=/home/akini/workspace/giraph_test
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] tickTime set to 2000
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] minSessionTimeout set to 10000
>>>>>> [14:56:25]  INFO: [ZooKeeperServer] maxSessionTimeout set to 100000
>>>>>> [14:56:25]  WARN: [JobClient] Use GenericOptionsParser for parsing
the
>>>>>> arguments. Applications should implement Tool for the same.
>>>>>> [14:56:25]  INFO: [NIOServerCnxn] binding to port 0.0.0.0/0.0.0.0:22182
>>>>>> [14:56:25]  INFO: [FileTxnSnapLog] Snapshotting: 0
>>>>>> [14:56:25]  INFO: [JobClient] Running job: job_201303070954_0007
>>>>>> [14:56:26]  INFO: [JobClient]  map 0% reduce 0%
>>>>>> [14:56:30]  INFO: [NIOServerCnxn] Accepted socket connection from
>>>>>> /127.0.0.1:43076
>>>>>> [14:56:30]  INFO: [NIOServerCnxn] Client attempting to establish
new
>>>>>> session at /127.0.0.1:43076
>>>>>> [14:56:30]  INFO: [FileTxnLog] Creating new log file: log.1
>>>>>> [14:56:31]  INFO: [NIOServerCnxn] Established session
>>>>>> 0x13d4b936bc30000 with negotiated timeout 60000 for client
>>>>>> /127.0.0.1:43076
>>>>>> [14:56:31]  INFO: [NIOServerCnxn] Accepted socket connection from
>>>>>> /127.0.0.1:43077
>>>>>> [14:56:31]  INFO: [NIOServerCnxn] Client attempting to establish
new
>>>>>> session at /127.0.0.1:43077
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30000
>>>>>> type:create cxid:0x1 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_masterElectionDir
>>>>>> Error:KeeperErrorCode = NoNode for
>>>>>> /_hadoopBsp/job_201303070954_0007/_masterElectionDir
>>>>>> [14:56:31]  INFO: [NIOServerCnxn] Established session
>>>>>> 0x13d4b936bc30001 with negotiated timeout 60000 for client
>>>>>> /127.0.0.1:43077
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30001
>>>>>> type:create cxid:0x1 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_masterJobState
>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>> /_hadoopBsp/job_201303070954_0007/_masterJobState
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30000
>>>>>> type:create cxid:0xc zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir/0
>>>>>> Error:KeeperErrorCode = NoNode for
>>>>>> /_hadoopBsp/job_201303070954_0007/_applicationA\
>>>>>> ttemptsDir/0
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30001
>>>>>> type:create cxid:0x3 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir
>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>> /_hadoopBsp/job_201303070954_0007/_applicatio\
>>>>>> nAttemptsDir
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30000
>>>>>> type:create cxid:0xd zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error Path:/_hadoopBsp Error:KeeperErrorCode =
>>>>>> NodeExists for /_hadoopBsp
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30000
>>>>>> type:create cxid:0xe zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007
>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>> /_hadoopBsp/job_201303070954_0007
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30001
>>>>>> type:create cxid:0x5 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir/0
>>>>>> Error:KeeperErrorCode = NoNode for
>>>>>> /_hadoopBsp/job_201303070954_0007/_applicationA\
>>>>>> ttemptsDir/0
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30000
>>>>>> type:create cxid:0xf zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir
>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>> /_hadoopBsp/job_201303070954_0007/_applicatio\
>>>>>> nAttemptsDir
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30001
>>>>>> type:create cxid:0x6 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error Path:/_hadoopBsp Error:KeeperErrorCode =
>>>>>> NodeExists for /_hadoopBsp
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30001
>>>>>> type:create cxid:0x7 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error Path:/_hadoopBsp/job_201303070954_0007
>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>> /_hadoopBsp/job_201303070954_0007
>>>>>> [14:56:31]  INFO: [PrepRequestProcessor] Got user-level
>>>>>> KeeperException when processing sessionid:0x13d4b936bc30001
>>>>>> type:create cxid:0x8 zxid:0xfffffffffffffffe txntype:un\
>>>>>> known reqpath:n/a Error
>>>>>> Path:/_hadoopBsp/job_201303070954_0007/_applicationAttemptsDir
>>>>>> Error:KeeperErrorCode = NodeExists for
>>>>>> /_hadoopBsp/job_201303070954_0007/_application
>>>>>
>>>>>
>>>

Mime
View raw message