giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: Deadlock when running on Hadoop 1.0.4
Date Fri, 25 Jan 2013 23:02:45 GMT
Interesting. Dedicated zk instance doesn't work with hadoop-2.0.x or trunk
either when running Giraph on YARN/MRv2. I would like to look into this
more if I have time. Anyone have any ideas? And, anyone have a definitely
timeline on how long this has been broken? Most of my work with Giraph last
summer was on a cluster with its own ZK so I have not used the feature
much. I do rememebr it working on 1.0.something hadoop profile at maybe
christmas of 2011? But that was a long time ago...


On Fri, Jan 25, 2013 at 3:07 AM, Sebastian Schelter <ssc@apache.org> wrote:

> Hi,
>
> I get exactly the same deadlock when using a dedicated (non-distributed)
> ZK instance. I tried 3.3.6 and 3.4.5.
>
> I haven't used Giraph for a while, so I can't say whether this has
> worked recently...
>
> Best,
> Sebastian
>
>
>
> On 23.01.2013 05:14, Eli Reisman wrote:
> > Hi Sebastian,
> >
> > This seems to be a new issue related to our recent upgrade to
> > multithreading. I have not seen this before. All my other emails related
> to
> > the array index out of bounds error you found over the weekend.
> >
> > however, I have had trouble with the local zk instance for some time now
> on
> > a number of Giraph profiles and pretty much exclusively use a separate ZK
> > instance of my own. Last summer I was running a lot of jobs on a 1.0.x
> > hadoop cluster with Giraph, and I was told to use the on-cluster
> dedicated
> > ZK quorum due to "problems" with Giraph's local ZK instanantiation. No
> one
> > got more specific with me than that. I also can't get the local ZK
> > instances to come up on Hadoop-2.0.x -- perhaps this feature of Giraph
> has
> > had problems for a while. Was it working for you recently?
> >
> > If you notice any other clues as to the cause, please post them I'm
> hoping
> > to do some work aorund this soon.
> >
> > On Tue, Jan 22, 2013 at 11:05 AM, Claudio Martella <
> > claudio.martella@gmail.com> wrote:
> >
> >> Hi Sebastian,
> >>
> >> I do not know what is happening, I am also having problems of jobs
> >> blocking while waiting to setup the zookeeper instance.
> >> We should look into this.
> >>
> >> Best,
> >> Claudio
> >>
> >>
> >> On Mon, Jan 21, 2013 at 1:59 PM, Sebastian Schelter <ssc@apache.org
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm testing a custom PageRank implementation using trunk on Hadoop
> >>> 1.0.4. I seem to run into a deadlock after the input superstep.
> >>>
> >>> The workers report "finishSuperstep: (all workers done) WORKER_ONLY -
> >>> Attempt=0, Superstep=0" and the master reports that all workers are
> done
> >>> with superstep -1.
> >>>
> >>> I reconstructed this using a local setup and seems to me that the
> >>> BspServiceMaster hangs in the cleanUpZooKeeper method infinitely.
> >>>
> >>> I'm not using a dedicated zk instance, I just have Giraph start one.
> Any
> >>> ideas what can be done to fix my problem?
> >>>
> >>> Best,
> >>> Sebastian
> >>>
> >>>
> >>> excerpt from jstack
> >>>
> >>> "org.apache.giraph.master.MasterThread" prio=10 tid=0x00007f29fc385000
> >>> nid=0x29d1 waiting on condition [0x00007f2a09a5f000]
> >>>    java.lang.Thread.State: TIMED_WAITING (parking)
> >>>         at sun.misc.Unsafe.park(Native Method)
> >>>         - parking to wait for  <0x00000000f38967d8> (a
> >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >>>         at
> >>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
> >>>         at
> >>>
> >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
> >>>         at
> >>> org.apache.giraph.zk.PredicateLock.waitMsecs(PredicateLock.java:112)
> >>>         at
> >>> org.apache.giraph.zk.PredicateLock.waitForever(PredicateLock.java:138)
> >>>         at
> >>>
> >>>
> org.apache.giraph.master.BspServiceMaster.cleanUpZooKeeper(BspServiceMaster.java:1602)
> >>>         at
> >>>
> >>>
> org.apache.giraph.master.BspServiceMaster.cleanup(BspServiceMaster.java:1692)
> >>>         at
> >>> org.apache.giraph.master.MasterThread.run(MasterThread.java:144)
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >>    Claudio Martella
> >>    claudio.martella@gmail.com
> >>
> >
>
>

Mime
View raw message