giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Reisman <apache.mail...@gmail.com>
Subject Re: more concurrency issues with requests management?
Date Wed, 20 Feb 2013 23:49:37 GMT
Sorry didn't see the bottom of this thread before! I like Maja's idea, some
mega-supernode (thats a technical term) just sent that vertex so many
messages that the byte buffer size overflowed on the right shift. I saw
this problem in some other forms last summer. The number of other ways that
number could go negative seem limited.


On Wed, Feb 20, 2013 at 3:41 PM, Eli Reisman <apache.mailbox@gmail.com>wrote:

> I do know when you run out of memory it often looks like the logs are
> fine, or seem to end in a wierd spot, but really the last X number of log
> messages just doesn't get logged. Sometimes its worth looking at the worker
> logs individually and one in every group of them will actually get a
> sensible exception message logged. Don't know if thats the case though. I
> would tend to agree with your intuition regarding the queues and
> concurrency.
>
> On Tue, Feb 19, 2013 at 3:04 PM, Claudio Martella <
> claudio.martella@gmail.com> wrote:
>
>> Looking at the second line of the logs, it does not look like the case,
>> but
>> it could be. do you have any suggestion for the flags in particular? I'm
>> running with -XX:+HeapDumpOnOutOfMemoryError  now, and I don't get any
>> dump.
>>
>> It could be the case, but the behavior is quite consistent currently, I'd
>> get a Child error/OOM pretty soon in these cases.
>>
>>
>> On Tue, Feb 19, 2013 at 11:45 PM, Avery Ching <aching@apache.org> wrote:
>>
>> > Can you add GC flags?  Perhaps you are running into full GCs...
>> >
>> >
>> > On 2/19/13 1:45 PM, Claudio Martella wrote:
>> >
>> >> I have not investigated further yet, but this does not look connected
>> to
>> >> ZK, but mostly to some concurrency on the queues. The number of
>> requests
>> >> stays to 29...
>> >>
>> >>
>> >> On Tue, Feb 19, 2013 at 9:58 PM, Sebastian Schelter <ssc@apache.org>
>> >> wrote:
>> >>
>> >>  For me, Giraph on Hadoop 1.0.4 runs into some kind of deadlock after
>> the
>> >>> input was loaded when using it with 200 workers on a billion-scale
>> >>> graph....
>> >>>
>> >>> I tried several versions of zookeeper, but nothing helped.
>> >>>
>> >>> Best,
>> >>> Sebastian
>> >>>
>> >>>
>> >>> On 19.02.2013 21:55, Claudio Martella wrote:
>> >>>
>> >>>> I'm running PR with trunk on yahoo webgraph across 528 workers and
I
>> get
>> >>>> the workers hanging at the flushing:
>> >>>>
>> >>>> 2013-02-19 21:40:07,187 INFO
>> org.apache.giraph.graph.**ComputeCallable:
>> >>>>
>> >>> call:
>> >>>
>> >>>> Computation took 28.386961 secs for 95 partitions on superstep 0.
>> >>>>
>> >>>   Flushing
>> >>>
>> >>>> started
>> >>>> 2013-02-19 21:40:07,604 INFO org.apache.giraph.worker.**
>> >>>> BspServiceWorker:
>> >>>> finishSuperstep: Waiting on all requests, superstep 0 Memory
>> >>>> (free/total/max) = 5993.42M / 8661.44M / 14346.69M
>> >>>> 2013-02-19 21:40:07,605 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 558 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 6.4758, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0003, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 27.083
>> >>>> 2013-02-19 21:40:26,717 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 39 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 3.7965, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068, ave sent
req
>> >>>>
>> >>> MBytes
>> >>>
>> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 46.195
>> >>>> 2013-02-19 21:40:32,248 INFO
>> >>>> org.apache.giraph.comm.netty.**handler.RequestDecoder: decode: Server
>> >>>>
>> >>> window
>> >>>
>> >>>> metrics MBytes/sec sent = 0.0002, MBytes/sec received = 3.9824,
>> >>>>
>> >>> MBytesSent
>> >>>
>> >>>> = 0.0066, MBytesReceived = 155.4787, ave sent req MBytes = 0, ave
>> >>>>
>> >>> received
>> >>>
>> >>>> req MBytes = 0.0155, secs waited = 39.04
>> >>>> 2013-02-19 21:40:42,041 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 36 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 2.8509, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068, ave sent
req
>> >>>>
>> >>> MBytes
>> >>>
>> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 61.519
>> >>>> 2013-02-19 21:41:07,456 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 2.0174, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
req
>> >>>>
>> >>> MBytes
>> >>>
>> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 86.933
>> >>>> 2013-02-19 21:41:22,456 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.7205, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
req
>> >>>>
>> >>> MBytes
>> >>>
>> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 101.934
>> >>>> 2013-02-19 21:41:37,457 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.4998, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
req
>> >>>>
>> >>> MBytes
>> >>>
>> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 116.935
>> >>>> 2013-02-19 21:41:52,458 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.3293, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
req
>> >>>>
>> >>> MBytes
>> >>>
>> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 131.936
>> >>>> 2013-02-19 21:42:22,459 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.083, MBytes/sec
>> received
>> >>>> =
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 161.937
>> >>>> 2013-02-19 21:42:37,460 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.9912, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 176.938
>> >>>> 2013-02-19 21:43:07,461 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.8475, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 206.939
>> >>>> 2013-02-19 21:43:37,462 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.7402, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 236.94
>> >>>> 2013-02-19 21:43:52,463 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.6961, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 251.941
>> >>>> 2013-02-19 21:44:07,464 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.657, MBytes/sec
>> received
>> >>>> =
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 266.942
>> >>>> 2013-02-19 21:44:37,465 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5906, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 296.943
>> >>>> 2013-02-19 21:45:07,466 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5364, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 326.944
>> >>>> 2013-02-19 21:45:22,467 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5129, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 341.945
>> >>>> 2013-02-19 21:45:37,468 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4913, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 356.946
>> >>>> 2013-02-19 21:46:07,469 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4532, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 386.947
>> >>>> 2013-02-19 21:46:22,470 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4363, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 401.948
>> >>>> 2013-02-19 21:46:52,471 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.406, MBytes/sec
>> received
>> >>>> =
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 431.949
>> >>>> 2013-02-19 21:47:07,472 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3924, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 446.95
>> >>>> 2013-02-19 21:47:37,473 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3677, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 476.951
>> >>>> 2013-02-19 21:48:07,474 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.346, MBytes/sec
>> received
>> >>>> =
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 506.952
>> >>>> 2013-02-19 21:48:22,475 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.336, MBytes/sec
>> received
>> >>>> =
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 521.953
>> >>>> 2013-02-19 21:48:52,476 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3177, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 551.954
>> >>>> 2013-02-19 21:49:07,477 INFO org.apache.giraph.comm.netty.**
>> >>>> NettyClient:
>> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
>> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3093, MBytes/sec
>> received
>> >>>>
>> >>> =
>> >>>
>> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
>> MBytes =
>> >>>> 0.2998, ave received req MBytes = 0, secs waited = 566.955
>> >>>>
>> >>>> It looks similar to the problem we had earlier with concurrency.
>> >>>>
>> >>>> Are you guys experiencing something similar at large-scale?
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message