giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maja Kabiljo <majakabi...@fb.com>
Subject Re: more concurrency issues with requests management?
Date Thu, 21 Feb 2013 15:22:38 GMT
Yeah, that's exactly what I told you.
We have one ExtendedDataOutput per vertex. And the limit is about
(2^30)/4, since you will have double of the size allocated at some point.
What do you mean you don't want to use a combiner for evaluation reasons?

On 2/21/13 5:38 AM, "Claudio Martella" <claudio.martella@gmail.com> wrote:

>Yep. Actually more than a supernode sending messages it looks more like a
>supernode receiving a lot of messages, and hence filling the inbox queue
>(in bytearrayformat).
>I assumed a max (2^31)/4 (for float messages) limit because I assumed we
>had such an object per vertex, but it looks like the ByteArrayOutputFormat
>is used for more than a single vertex.
>Maja, do you think it could make sense to use one array per vertex (I
>haven't looked deep at the code, but I assume that we have one per
>partition now)?
>
>
>
>On Thu, Feb 21, 2013 at 12:49 AM, Eli Reisman
><apache.mailbox@gmail.com>wrote:
>
>> Sorry didn't see the bottom of this thread before! I like Maja's idea,
>>some
>> mega-supernode (thats a technical term) just sent that vertex so many
>> messages that the byte buffer size overflowed on the right shift. I saw
>> this problem in some other forms last summer. The number of other ways
>>that
>> number could go negative seem limited.
>>
>>
>> On Wed, Feb 20, 2013 at 3:41 PM, Eli Reisman <apache.mailbox@gmail.com
>> >wrote:
>>
>> > I do know when you run out of memory it often looks like the logs are
>> > fine, or seem to end in a wierd spot, but really the last X number of
>>log
>> > messages just doesn't get logged. Sometimes its worth looking at the
>> worker
>> > logs individually and one in every group of them will actually get a
>> > sensible exception message logged. Don't know if thats the case
>>though. I
>> > would tend to agree with your intuition regarding the queues and
>> > concurrency.
>> >
>> > On Tue, Feb 19, 2013 at 3:04 PM, Claudio Martella <
>> > claudio.martella@gmail.com> wrote:
>> >
>> >> Looking at the second line of the logs, it does not look like the
>>case,
>> >> but
>> >> it could be. do you have any suggestion for the flags in particular?
>>I'm
>> >> running with -XX:+HeapDumpOnOutOfMemoryError  now, and I don't get
>>any
>> >> dump.
>> >>
>> >> It could be the case, but the behavior is quite consistent currently,
>> I'd
>> >> get a Child error/OOM pretty soon in these cases.
>> >>
>> >>
>> >> On Tue, Feb 19, 2013 at 11:45 PM, Avery Ching <aching@apache.org>
>> wrote:
>> >>
>> >> > Can you add GC flags?  Perhaps you are running into full GCs...
>> >> >
>> >> >
>> >> > On 2/19/13 1:45 PM, Claudio Martella wrote:
>> >> >
>> >> >> I have not investigated further yet, but this does not look
>>connected
>> >> to
>> >> >> ZK, but mostly to some concurrency on the queues. The number of
>> >> requests
>> >> >> stays to 29...
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 19, 2013 at 9:58 PM, Sebastian Schelter
>><ssc@apache.org>
>> >> >> wrote:
>> >> >>
>> >> >>  For me, Giraph on Hadoop 1.0.4 runs into some kind of deadlock
>>after
>> >> the
>> >> >>> input was loaded when using it with 200 workers on a
>>billion-scale
>> >> >>> graph....
>> >> >>>
>> >> >>> I tried several versions of zookeeper, but nothing helped.
>> >> >>>
>> >> >>> Best,
>> >> >>> Sebastian
>> >> >>>
>> >> >>>
>> >> >>> On 19.02.2013 21:55, Claudio Martella wrote:
>> >> >>>
>> >> >>>> I'm running PR with trunk on yahoo webgraph across 528
workers
>>and
>> I
>> >> get
>> >> >>>> the workers hanging at the flushing:
>> >> >>>>
>> >> >>>> 2013-02-19 21:40:07,187 INFO
>> >> org.apache.giraph.graph.**ComputeCallable:
>> >> >>>>
>> >> >>> call:
>> >> >>>
>> >> >>>> Computation took 28.386961 secs for 95 partitions on superstep
>>0.
>> >> >>>>
>> >> >>>   Flushing
>> >> >>>
>> >> >>>> started
>> >> >>>> 2013-02-19 21:40:07,604 INFO org.apache.giraph.worker.**
>> >> >>>> BspServiceWorker:
>> >> >>>> finishSuperstep: Waiting on all requests, superstep 0 Memory
>> >> >>>> (free/total/max) = 5993.42M / 8661.44M / 14346.69M
>> >> >>>> 2013-02-19 21:40:07,605 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 558
open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 6.4758,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0003, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 27.083
>> >> >>>> 2013-02-19 21:40:26,717 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 39 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 3.7965,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068,
ave sent
>> req
>> >> >>>>
>> >> >>> MBytes
>> >> >>>
>> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 46.195
>> >> >>>> 2013-02-19 21:40:32,248 INFO
>> >> >>>> org.apache.giraph.comm.netty.**handler.RequestDecoder:
decode:
>> Server
>> >> >>>>
>> >> >>> window
>> >> >>>
>> >> >>>> metrics MBytes/sec sent = 0.0002, MBytes/sec received =
3.9824,
>> >> >>>>
>> >> >>> MBytesSent
>> >> >>>
>> >> >>>> = 0.0066, MBytesReceived = 155.4787, ave sent req MBytes
= 0,
>>ave
>> >> >>>>
>> >> >>> received
>> >> >>>
>> >> >>>> req MBytes = 0.0155, secs waited = 39.04
>> >> >>>> 2013-02-19 21:40:42,041 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 36 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 2.8509,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068,
ave sent
>> req
>> >> >>>>
>> >> >>> MBytes
>> >> >>>
>> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 61.519
>> >> >>>> 2013-02-19 21:41:07,456 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 2.0174,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069,
ave sent
>> req
>> >> >>>>
>> >> >>> MBytes
>> >> >>>
>> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 86.933
>> >> >>>> 2013-02-19 21:41:22,456 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.7205,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069,
ave sent
>> req
>> >> >>>>
>> >> >>> MBytes
>> >> >>>
>> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 101.934
>> >> >>>> 2013-02-19 21:41:37,457 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.4998,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069,
ave sent
>> req
>> >> >>>>
>> >> >>> MBytes
>> >> >>>
>> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 116.935
>> >> >>>> 2013-02-19 21:41:52,458 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.3293,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069,
ave sent
>> req
>> >> >>>>
>> >> >>> MBytes
>> >> >>>
>> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 131.936
>> >> >>>> 2013-02-19 21:42:22,459 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.083,
MBytes/sec
>> >> received
>> >> >>>> =
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 161.937
>> >> >>>> 2013-02-19 21:42:37,460 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.9912,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 176.938
>> >> >>>> 2013-02-19 21:43:07,461 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.8475,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 206.939
>> >> >>>> 2013-02-19 21:43:37,462 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.7402,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 236.94
>> >> >>>> 2013-02-19 21:43:52,463 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.6961,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 251.941
>> >> >>>> 2013-02-19 21:44:07,464 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.657,
MBytes/sec
>> >> received
>> >> >>>> =
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 266.942
>> >> >>>> 2013-02-19 21:44:37,465 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5906,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 296.943
>> >> >>>> 2013-02-19 21:45:07,466 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5364,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 326.944
>> >> >>>> 2013-02-19 21:45:22,467 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5129,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 341.945
>> >> >>>> 2013-02-19 21:45:37,468 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4913,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 356.946
>> >> >>>> 2013-02-19 21:46:07,469 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4532,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 386.947
>> >> >>>> 2013-02-19 21:46:22,470 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4363,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 401.948
>> >> >>>> 2013-02-19 21:46:52,471 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.406,
MBytes/sec
>> >> received
>> >> >>>> =
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 431.949
>> >> >>>> 2013-02-19 21:47:07,472 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3924,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 446.95
>> >> >>>> 2013-02-19 21:47:37,473 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3677,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 476.951
>> >> >>>> 2013-02-19 21:48:07,474 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.346,
MBytes/sec
>> >> received
>> >> >>>> =
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 506.952
>> >> >>>> 2013-02-19 21:48:22,475 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.336,
MBytes/sec
>> >> received
>> >> >>>> =
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 521.953
>> >> >>>> 2013-02-19 21:48:52,476 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3177,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 551.954
>> >> >>>> 2013-02-19 21:49:07,477 INFO org.apache.giraph.comm.netty.**
>> >> >>>> NettyClient:
>> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
>> requests,
>> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3093,
MBytes/sec
>> >> received
>> >> >>>>
>> >> >>> =
>> >> >>>
>> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave
sent req
>> >> MBytes =
>> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 566.955
>> >> >>>>
>> >> >>>> It looks similar to the problem we had earlier with concurrency.
>> >> >>>>
>> >> >>>> Are you guys experiencing something similar at large-scale?
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >>    Claudio Martella
>> >>    claudio.martella@gmail.com
>> >>
>> >
>> >
>>
>
>
>
>-- 
>   Claudio Martella
>   claudio.martella@gmail.com


Mime
View raw message