giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: Deadlock when running on Hadoop 1.0.4
Date Wed, 23 Jan 2013 04:14:53 GMT
Hi Sebastian,

This seems to be a new issue related to our recent upgrade to
multithreading. I have not seen this before. All my other emails related to
the array index out of bounds error you found over the weekend.

however, I have had trouble with the local zk instance for some time now on
a number of Giraph profiles and pretty much exclusively use a separate ZK
instance of my own. Last summer I was running a lot of jobs on a 1.0.x
hadoop cluster with Giraph, and I was told to use the on-cluster dedicated
ZK quorum due to "problems" with Giraph's local ZK instanantiation. No one
got more specific with me than that. I also can't get the local ZK
instances to come up on Hadoop-2.0.x -- perhaps this feature of Giraph has
had problems for a while. Was it working for you recently?

If you notice any other clues as to the cause, please post them I'm hoping
to do some work aorund this soon.

On Tue, Jan 22, 2013 at 11:05 AM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> Hi Sebastian,
>
> I do not know what is happening, I am also having problems of jobs
> blocking while waiting to setup the zookeeper instance.
> We should look into this.
>
> Best,
> Claudio
>
>
> On Mon, Jan 21, 2013 at 1:59 PM, Sebastian Schelter <ssc@apache.org>wrote:
>
>> Hi,
>>
>> I'm testing a custom PageRank implementation using trunk on Hadoop
>> 1.0.4. I seem to run into a deadlock after the input superstep.
>>
>> The workers report "finishSuperstep: (all workers done) WORKER_ONLY -
>> Attempt=0, Superstep=0" and the master reports that all workers are done
>> with superstep -1.
>>
>> I reconstructed this using a local setup and seems to me that the
>> BspServiceMaster hangs in the cleanUpZooKeeper method infinitely.
>>
>> I'm not using a dedicated zk instance, I just have Giraph start one. Any
>> ideas what can be done to fix my problem?
>>
>> Best,
>> Sebastian
>>
>>
>> excerpt from jstack
>>
>> "org.apache.giraph.master.MasterThread" prio=10 tid=0x00007f29fc385000
>> nid=0x29d1 waiting on condition [0x00007f2a09a5f000]
>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for  <0x00000000f38967d8> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
>>         at
>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
>>         at
>> org.apache.giraph.zk.PredicateLock.waitMsecs(PredicateLock.java:112)
>>         at
>> org.apache.giraph.zk.PredicateLock.waitForever(PredicateLock.java:138)
>>         at
>>
>> org.apache.giraph.master.BspServiceMaster.cleanUpZooKeeper(BspServiceMaster.java:1602)
>>         at
>>
>> org.apache.giraph.master.BspServiceMaster.cleanup(BspServiceMaster.java:1692)
>>         at
>> org.apache.giraph.master.MasterThread.run(MasterThread.java:144)
>>
>>
>>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

Mime
View raw message