Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4720BDF79 for ; Thu, 16 May 2013 21:00:05 +0000 (UTC) Received: (qmail 4582 invoked by uid 500); 16 May 2013 21:00:05 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 4520 invoked by uid 500); 16 May 2013 21:00:05 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 4512 invoked by uid 99); 16 May 2013 21:00:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 21:00:04 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [24.173.238.83] (HELO mail.potomacfusion.com) (24.173.238.83) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 21:00:00 +0000 Received: from PFI-MAIL.PFI.LOCAL ([192.168.100.7]) by pfi-mail ([192.168.100.7]) with mapi; Thu, 16 May 2013 15:59:19 -0500 From: Eric Kimbrel To: "user@giraph.apache.org" Date: Thu, 16 May 2013 16:00:00 -0500 Subject: Re: Broadcast of large aggregated value is slow. Thread-Topic: Broadcast of large aggregated value is slow. Thread-Index: Ac5SeDsUqiKNTabbSfeEbBKX97xgWA== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.0.121105 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CDBA96882F5EEKimbrelpotomacfusioncom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CDBA96882F5EEKimbrelpotomacfusioncom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable >From the attached logs in original post, you can see that both workers use = about 4 seconds of compute time on super step 4, but they complete super st= ep 4 about 10 minutes apart. Eric Kimbrel Software Engineer I Data Fusion & Analytics Sotera Defense Solutions, Inc. o: 360-516-6621 c: 360-990-1873 e: Eric.Kimbrel@soteradefense.com w: www.potomacfusion.com | www.soteradefense= .com Agility. Ingenuity. Integrity. From: Eric Kimbrel > Reply-To: "user@giraph.apache.org" > Date: Thursday, May 16, 2013 1:50 PM To: "user@giraph.apache.org" > Subject: Broadcast of large aggregated value is slow. I have an giraph job in which the Master will read a chunk of a file from H= DFS, and then use an aggregator to broadcast the data to all vertices. No = other messages are sent, and no vertices aggregate values, only the master. In the attached logs you can see that the time spent to broadcast the data = to all vertices is slow, and seems to be hanging up somehwere. It appears = that the majority of workers receive the data in 10-15 seconds, but then no= thing happens for around 10 minutes. Log snippet shown below Is there a known reason why transmitting this data during the synchronizati= on is taking so long, or anything that can be done to speed it up? 2013-05-16 11:09:03,041 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 30 more tasks to send their aggregator= data 2013-05-16 11:09:14,444 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 10 more tasks to send their aggregator= data, task ids: [13, 20, 22, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:09:25,190 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:09:45,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:10:05,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:10:15,192 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:10:35,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:10:55,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:11:05,194 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:11:25,195 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:11:45,196 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:12:05,196 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:12:15,197 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:12:35,198 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:12:55,198 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:13:05,199 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:13:25,200 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:13:45,201 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:14:05,201 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:14:15,202 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:14:35,203 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:14:55,204 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:15:15,205 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:15:35,205 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:15:45,206 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:16:05,207 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:16:25,208 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:16:45,208 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:16:55,209 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:17:15,210 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:17:35,210 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:17:45,211 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:18:05,212 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:= waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator = data, task ids: [13, 34, 38, 50, 48, 52, 58, 56] 2013-05-16 11:18:19,841 INFO org.apache.giraph.comm.netty.handler.RequestDe= coder: decode: Server window metrics MBytes/sec sent =3D 0, MBytes/sec rece= ived =3D 0.027, MBytesSent =3D 0.0006, MBytesReceived =3D 15.4028, ave sent= req MBytes =3D 0, ave received req MBytes =3D 0.0034, secs waited =3D 571.= 136 Eric Kimbrel Software Engineer I Data Fusion & Analytics Sotera Defense Solutions, Inc. o: 360-516-6621 c: 360-990-1873 e: Eric.Kimbrel@soteradefense.com w: www.potomacfusion.com | www.soteradefense= .com Agility. Ingenuity. Integrity. --_000_CDBA96882F5EEKimbrelpotomacfusioncom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
From the attac= hed logs in original post, you can see that both workers use about 4 second= s of compute time on super step 4, but they complete super step 4 about 10 = minutes apart.


Eric Kimbrel
Software Enginee= r
 I Data Fusion &= amp; Analytics

Sotera Defense Solutions, Inc.<= /span>

o: 3= 60-516-6621

c: 360-990-1873

e: Eric.Kimbrel@soteradefense.com 
w: 
 | =

Agility= I= ngenuityIntegrity.



From: Eric Kimbrel <Eric.Kimbrel@soteradefense.com>
Reply-To: "
user@giraph.apache.org" <user@giraph.apache.org>
Date: Thursday, May 16, 2013 1:50 PM
To: "user@giraph.apache.org" <user@giraph.apache.org>
S= ubject: Broadcast of large aggregated value is slow.
=

=
I have a= n giraph job in which the Master will read a chunk of a file from HDFS, and= then use an aggregator to broadcast the data to all vertices.  No oth= er messages are sent, and no vertices aggregate values, only the master.

In the attached logs you can see that the time spe= nt to broadcast the data to all vertices is slow, and seems to be hanging u= p somehwere.  It appears that the majority of workers receive the data in 10-15 seconds, but then nothing happens for around 10 minutes.  Lo= g snippet shown below

Is there a known reason why transmitting this data dur= ing the synchronization is taking so long, or anything that can be done to = speed it up?

2013-05-16 11:09:03,041 INFO org.apache.giraph.u=
tils.TaskIdsPermitsBarrier: waitForRequiredPermits: Waiting for 30 more tas=
ks to send their aggregator data
2013-05-16 11:09:14,444 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 10 more tasks to send their aggregator=
 data, task ids: [13, 20, 22, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:09:25,190 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:09:45,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:05,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:15,192 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:35,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:55,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:05,194 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:25,195 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:45,196 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:05,196 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:15,197 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:35,198 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:12:55,198 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:13:05,199 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:13:25,200 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:13:45,201 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:05,201 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:15,202 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:35,203 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:14:55,204 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:15:15,205 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:15:35,205 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:15:45,206 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:05,207 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:25,208 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:45,208 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:16:55,209 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:17:15,210 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:17:35,210 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:17:45,211 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:18:05,212 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier:=
 waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator =
data, task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:18:19,841 INFO org.apache.giraph.comm.netty.handler.RequestDe=
coder: decode: Server window metrics MBytes/sec sent =3D 0, MBytes/sec rece=
ived =3D 0.027, MBytesSent =3D 0.0006, MBytesReceived =3D 15.4028, ave sent=
 req MBytes =3D 0, ave received req MBytes =3D 0.0034, secs waited =3D 571.=
136






Eric Kimbrel
Software Engineer
 
I&nb= sp;Data Fusion & Analytics

Sotera Defense Solutio= ns, Inc.

o: 360-516-6621

c: 360-990-1873

e: Eric.Kimbrel@soteradefense.com 
w: 
www.potomacfusion.com |&nbs= p;www.soteradefense.com

AgilityIngenuityIntegrity

<= /div>
--_000_CDBA96882F5EEKimbrelpotomacfusioncom_--