giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maja Kabiljo <majakabi...@fb.com>
Subject Re: Waiting for times required to be 19 (currently 18)
Date Thu, 21 Feb 2013 17:48:24 GMT
Hi Nate,

When did you take the new Giraph code? Please check if you have GIRAPH-506 patch in, if not
that's probably the reason for the issue.

Maja

From: Nate <touring_fan@msn.com<mailto:touring_fan@msn.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" <user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Thursday, February 21, 2013 8:06 AM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" <user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: Waiting for times required to be 19 (currently 18)

I recently upgraded older Giraph code built against CDH3 to a git checkout from a few days
ago that builds against CDH4.1.0 (MRv1) libraries.  All of the Giraph tests pass.

When running my Giraph job with 20 workers, I usually get the above error in in 19 map processes:

org.apache.giraph.utils.ExpectedBarrier: waitForRequiredPermits: Waiting for times required
to be 19 (currently 18)

One map worker always shows something like:

org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting interval of 15000 msecs,
1 open requests, waiting for it to be <= 0,and some metrics ....
org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting for request (destTask=17,
reqId=5032) - (reqId=5326,destAddr=host1:30017,elapsedNanos=..., started=..., writeDone=true,
writeSuccess=true)
repeats...

I say this happens usually because the same giraph job does complete but only rarely.  I have
a timeout of 100 minutes set, and the job is killed after that much time has elapsed.

Also, the started field in the above output in this past run reads: "Wed Jan 21 14:21:31 EST
1970"  All machines are synchronized by a single time server and currently read accurate times.
 I don't think it affected the execution, but it still seems erroneous.

I also don't see Hadoop maps having status messages set on them.  I see the GraphMapper giving
the Context object to the GraphTaskManager instance, and I can see it calling "context.setStatus(...)"
but those messages never show up in the map status column in the job tracker page.

Is there something I've missed while upgrading the old code?

Mime
View raw message