Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46ABEE08D for ; Thu, 21 Feb 2013 17:48:57 +0000 (UTC) Received: (qmail 50320 invoked by uid 500); 21 Feb 2013 17:48:57 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 50164 invoked by uid 500); 21 Feb 2013 17:48:56 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 50154 invoked by uid 99); 21 Feb 2013 17:48:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 17:48:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=5764bc1ab4=majakabiljo@fb.com designates 67.231.145.42 as permitted sender) Received: from [67.231.145.42] (HELO mx0a-00082601.pphosted.com) (67.231.145.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 17:48:48 +0000 Received: from pps.filterd (m0004348 [127.0.0.1]) by m0004348.ppops.net (8.14.5/8.14.5) with SMTP id r1LHlVd7022533 for ; Thu, 21 Feb 2013 09:48:25 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : in-reply-to : content-type : mime-version; s=facebook; bh=flTbCD5QtX1CIyfV4Uusn5R7vM0ypRCOYXcFVWuOClg=; b=X97R08dFQKdOIMFgz/0Ude0W/BdRFk34GuwmsWkzHUy/dmPab/8oVFz/uaQVIaJFMG/w M5y3n7Y0yAItakNn8DQW/bEc6HBIlGP64pb0LDZXowNQgAXiv8av91AKsDM187+VtSw9 snv3eiDh3Vs/dB3h9ogNjHYK+bdxAtne8wc= Received: from mail.thefacebook.com (prn1-cmdf-dc01-fw1-nat.corp.tfbnw.net [173.252.71.129] (may be forged)) by m0004348.ppops.net with ESMTP id 1amh1exu2n-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK) for ; Thu, 21 Feb 2013 09:48:25 -0800 Received: from PRN-MBX02-2.TheFacebook.com ([169.254.5.107]) by PRN-CHUB03.TheFacebook.com ([fe80::fd64:bd05:4514:bbad%12]) with mapi id 14.02.0328.011; Thu, 21 Feb 2013 09:48:24 -0800 From: Maja Kabiljo To: "user@giraph.apache.org" Subject: Re: Waiting for times required to be 19 (currently 18) Thread-Topic: Waiting for times required to be 19 (currently 18) Thread-Index: AQHOEE1+7J4twVktVUKsE47o0DxUxJiElq6A Date: Thu, 21 Feb 2013 17:48:24 +0000 Message-ID: <1F592C080E9ACB4CB1C9EA1865BF3EFA056006C1@PRN-MBX02-2.TheFacebook.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.16.4] Content-Type: multipart/alternative; boundary="_000_1F592C080E9ACB4CB1C9EA1865BF3EFA056006C1PRNMBX022TheFac_" MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8327,1.0.431,0.0.0000 definitions=2013-02-21_08:2013-02-21,2013-02-21,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_1F592C080E9ACB4CB1C9EA1865BF3EFA056006C1PRNMBX022TheFac_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Nate, When did you take the new Giraph code? Please check if you have GIRAPH-506 = patch in, if not that's probably the reason for the issue. Maja From: Nate > Reply-To: "user@giraph.apache.org" > Date: Thursday, February 21, 2013 8:06 AM To: "user@giraph.apache.org" > Subject: Waiting for times required to be 19 (currently 18) I recently upgraded older Giraph code built against CDH3 to a git checkout = from a few days ago that builds against CDH4.1.0 (MRv1) libraries. All of = the Giraph tests pass. When running my Giraph job with 20 workers, I usually get the above error i= n in 19 map processes: org.apache.giraph.utils.ExpectedBarrier: waitForRequiredPermits: Waiting fo= r times required to be 19 (currently 18) One map worker always shows something like: org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting interva= l of 15000 msecs, 1 open requests, waiting for it to be <=3D 0,and some met= rics .... org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting for req= uest (destTask=3D17, reqId=3D5032) - (reqId=3D5326,destAddr=3Dhost1:30017,e= lapsedNanos=3D..., started=3D..., writeDone=3Dtrue, writeSuccess=3Dtrue) repeats... I say this happens usually because the same giraph job does complete but on= ly rarely. I have a timeout of 100 minutes set, and the job is killed afte= r that much time has elapsed. Also, the started field in the above output in this past run reads: "Wed Ja= n 21 14:21:31 EST 1970" All machines are synchronized by a single time ser= ver and currently read accurate times. I don't think it affected the execu= tion, but it still seems erroneous. I also don't see Hadoop maps having status messages set on them. I see the= GraphMapper giving the Context object to the GraphTaskManager instance, an= d I can see it calling "context.setStatus(...)" but those messages never sh= ow up in the map status column in the job tracker page. Is there something I've missed while upgrading the old code? --_000_1F592C080E9ACB4CB1C9EA1865BF3EFA056006C1PRNMBX022TheFac_ Content-Type: text/html; charset="us-ascii" Content-ID: <4F3C186F8FD520498364AA5C36948F02@fb.com> Content-Transfer-Encoding: quoted-printable
Hi Nate,

When did you take the new Giraph code? Please check if you have GIRAPH= -506 patch in, if not that's probably the reason for the issue.

Maja

From: Nate <touring_fan@msn.com>
Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
Date: Thursday, February 21, 2013 8= :06 AM
To: "user@giraph.apache.org" <user@giraph.apache.org>
Subject: Waiting for times required= to be 19 (currently 18)

I recently upgraded older Giraph code built against CDH3 t= o a git checkout from a few days ago that builds against CDH4.1.0 (MRv1) li= braries.  All of the Giraph tests pass.

When running my Giraph job with 20 workers, I usually get the above error i= n in 19 map processes:

org.apache.giraph.utils.ExpectedBarrier: waitForRequiredPermits: Waiting = for times required to be 19 (currently 18)

One map worker always shows something like:

org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting inter= val of 15000 msecs, 1 open requests, waiting for it to be <=3D 0,and some metrics ....
org.apache= .giraph.comm.netty.NettyClient: waitSomeRequests: Waiting for request (dest= Task=3D17, reqId=3D5032) - (reqId=3D5326,destAddr=3Dhost1:30017,elapsedNano= s=3D..., started=3D..., writeDone=3Dtrue, writeSuccess=3Dtrue)
repeats...

I say this happens usually because the same giraph job does complete but on= ly rarely.  I have a timeout of 100 minutes set, and the job is killed= after that much time has elapsed.

Also, the = started field in the above output in this past run reads: "Wed Jan 21 14:21:31= EST 1970"  All machines are synchronized by a single time server= and currently read accurate times.  I don't think it affected the exe= cution, but it still seems erroneous.

I also don't see Hadoop maps having status messages set on them.  I se= e the GraphMapper giving the Context object to the GraphTaskManager instanc= e, and I can see it calling "context.setStatus(...)" but those me= ssages never show up in the map status column in the job tracker page.

Is there something I've missed while upgrading the old code?
--_000_1F592C080E9ACB4CB1C9EA1865BF3EFA056006C1PRNMBX022TheFac_--