giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bu Xiao <buxia...@gmail.com>
Subject Re: MySQL Table
Date Fri, 06 Sep 2013 18:06:31 GMT
Thanks Claudio and Gustavo for your answer. I have another question. I run
my algorithm on a cluster that has 20 nodes. When I specify the number of
workers to be 10 (or more), the algorithms works well and produces the
expected output. But, if the number of workers is less than 10 I get the
following exception in ZooKeeper.
<https://plus.google.com/u/0/101834038373575526108?prsrc=4>
2013-09-06 10:39:04,313 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 0 connections, (0 total connected)
0 failed, 0 failures total.
2013-09-06 10:39:04,313 INFO org.apache.giraph.partition.PartitionBalancer:
balancePartitionsAcrossWorkers: Using algorithm static
2013-09-06 10:39:04,314 INFO org.apache.giraph.partition.PartitionUtils:
analyzePartitionStats: Vertices - Mean: 200000, Min: Worker(hostname=
node1.cluster.net, MRtaskID=5, port=30005) - 200000, Max: Worker(hostname=
node7.cluster.net, MRtaskID=1, port=30001) - 200000
2013-09-06 10:39:04,314 INFO org.apache.giraph.partition.PartitionUtils:
analyzePartitionStats: Edges - Mean: 10019985, Min: Worker(hostname=
node9.cluster.net, MRtaskID=4, port=30004) - 10000354, Max: Worker(hostname=
node5.cluster.net, MRtaskID=2, port=30002) - 10088901
2013-09-06 10:39:04,339 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: 0 out of 5 workers finished on superstep 2 on path
/_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir
2013-09-06 10:39:04,340 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Waiting on [node8.cluster.net_3, node1.cluster.net_5,
node9.cluster.net_4, node5.cluster.net_2, node7.cluster.net_1]
2013-09-06 10:40:15,255 INFO
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window
metrics MBytes/sec sent = 0, MBytes/sec received = 0, MBytesSent = 0,
MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0,
secs waited = 71.241
2013-09-06 10:40:15,291 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: 3 out of 5 workers finished on superstep 2 on path
/_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir
2013-09-06 10:40:15,291 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Waiting on [node1.cluster.net_5, node7.cluster.net_1]
2013-09-06 10:40:15,388 INFO org.apache.giraph.master.BspServiceMaster:
aggregateWorkerStats: Aggregation found
(vtx=1000000,finVtx=0,edges=50099927,msgCount=0,msgBytesCount=0,haltComputation=false)
on superstep = 2
2013-09-06 10:40:15,394 INFO org.apache.giraph.master.BspServiceMaster:
coordinateSuperstep: Cleaning up old Superstep
/_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/1
2013-09-06 10:40:15,531 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 2 took 71.313 seconds ended with
state THIS_SUPERSTEP_DONE and is now on superstep 3
2013-09-06 10:40:15,563 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 0 connections, (0 total connected)
0 failed, 0 failures total.
2013-09-06 10:40:15,563 INFO org.apache.giraph.partition.PartitionBalancer:
balancePartitionsAcrossWorkers: Using algorithm static
2013-09-06 10:40:15,564 INFO org.apache.giraph.partition.PartitionUtils:
analyzePartitionStats: Vertices - Mean: 200000, Min: Worker(hostname=
node1.cluster.net, MRtaskID=5, port=30005) - 200000, Max: Worker(hostname=
node7.cluster.net, MRtaskID=1, port=30001) - 200000
2013-09-06 10:40:15,564 INFO org.apache.giraph.partition.PartitionUtils:
analyzePartitionStats: Edges - Mean: 10019985, Min: Worker(hostname=
node9.cluster.net, MRtaskID=4, port=30004) - 10000354, Max: Worker(hostname=
node5.cluster.net, MRtaskID=2, port=30002) - 10088901
2013-09-06 10:40:15,587 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: 0 out of 5 workers finished on superstep 3 on path
/_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/3/_workerFinishedDir
2013-09-06 10:40:15,587 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Waiting on [node8.cluster.net_3, node1.cluster.net_5,
node9.cluster.net_4, node5.cluster.net_2, node7.cluster.net_1]
2013-09-06 10:50:18,111 ERROR org.apache.giraph.master.BspServiceMaster:
superstepChosenWorkerAlive: Missing chosen worker Worker(hostname=
node7.cluster.net, MRtaskID=1, port=30001) on superstep 3
2013-09-06 10:50:18,111 ERROR org.apache.giraph.master.BspServiceMaster:
superstepChosenWorkerAlive: Missing chosen worker Worker(hostname=
node9.cluster.net, MRtaskID=4, port=30004) on superstep 3
2013-09-06 10:50:18,111 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 3 took 602.58 seconds ended with
state WORKER_FAILURE and is now on superstep 3
2013-09-06 10:50:18,118 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException: -1
        at
org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1272)
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:139)
2013-09-06 10:50:18,119 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg =
java.lang.ArrayIndexOutOfBoundsException: -1, exiting...
java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException:
-1
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:185)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at
org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1272)
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:139)
2013-09-06 10:50:18,122 INFO org.apache.giraph.zk.ZooKeeperManager: run:
Shutdown hook started.
2013-09-06 10:50:18,122 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
process.
2013-09-06 10:50:18,495 INFO org.apache.zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x140f459adcd0000, likely server
has closed socket, closing socket connection and attempting reconnect
2013-09-06 10:50:18,496 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143
typically means killed).

Thank you.


On Fri, Sep 6, 2013 at 3:51 AM, Gustavo Enrique Salazar Torres <
gsalazar@ime.usp.br> wrote:

> Hi Bu:
> Until the interface with Gora is available you could use Apache Sqoop to
> import your mysql table into HDFS and then run your Giraph job.
>
> Cheers
> Gustavo
> Em 06/09/2013 04:43, "Claudio Martella" <claudio.martella@gmail.com>
> escreveu:
>
> Hi Bu,
>>
>> no, currently we do not have a DBInputFormat. We have an open issue with
>> a google summer of code student working on a GoraInputFormat, which
>> supports also reading from RDBMs through Gora. However, if/when it will get
>> it, it will not provide a rich semantic as DBInputFormat, e.g. you'll be
>> able to only provide scan-like/range queries, instead of ANY query like
>> DBInputFormat.
>>
>> I think that creating an DB[Vertex|Edge]InputFormat starting from the
>> hadoop DBInputFormat should not be too hard and could prove to be a very
>> useful contribution. If you think about providing an implementation, I can
>> provide guidance.
>>
>> Best,
>> Claudio
>>
>>
>> On Fri, Sep 6, 2013 at 1:45 AM, Bu Xiao <buxiao82@gmail.com> wrote:
>>
>>> Hi Girapher,
>>>
>>>        I am currently working on algorithm that requires reading the
>>> vertices from MySQL table and not from HDFS. I thought that there has to be
>>> a way of reading data from SQL table since Giraph is built on top of
>>> Hadoop. But I do not seem to figure this part out. Do you have a class
>>> similar to the DBInputFormat in Hadoop? Thank you very much for your help.
>>>
>>>
>>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com
>>
>

Mime
View raw message