giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hai Lan <lanhai1...@gmail.com>
Subject Re: Out of core computation fails with KryoException: Buffer underflow
Date Wed, 09 Nov 2016 15:30:02 GMT
Many thanks Hassan

I did test a fixed number of partitions without isStaticGraph=true and it
can work great.

I'll follow your instruction to test adaptive mechanism then. But I have
two small questions:

1. Is there any difference in performance aspect of using fixed number
setting and adaptive setting?

2. As I known, the out of core can only split up to 90% input graph into
disk. Does that mean for example, a 10 Tb graph can be processed with at
least 1 Tb available memory?

Thanks again,

Best,

Hai


On Tue, Nov 8, 2016 at 12:42 PM, Hassan Eslami <hsn.eslami@gmail.com> wrote:

> Hi Hai,
>
> I notice that you are trying to use the new OOC mechanism too. Here is my
> take on your issue:
>
> As mentioned earlier in the thread, we noticed there is a bug with
> "isStaticGraph=true" option. This is a flag only for optimization purposes.
> I'll create a JIRA and send a fix for it, but for now, please run your job
> without this flag. This should help you pass the first superstep.
>
> As for the adaptive mechanism vs. fixed number of partitions, both
> approaches are now acceptable in the new OOC design. If you add "giraph.maxPartitionsInMemory"
> the OOC infrastructure assumes that you are using fixed number of
> partitions in memory and ignores any other OOC-related flags in your
> command. This is done to be backward compatible with existing codes
> depending on OOC in the previous version. But, be advised that using this
> type of out-of-core execution WILL NOT prevent your job from failures due
> to spikes in messages. Also, you have to make sure that the number you
> specify as the number of partitions in memory is set in a way that your
> specified number of partitions and their messages will fit in your
> available memory.
>
> On the other hand, I encourage you to use the adaptive mechanism in which
> you do not have to mention the number of partitions in memory, and the OOC
> mechanism underneath will figure things out automatically. To use the
> adaptive mechanism, you should use the following flags:
> giraph.useOutOfCoreGraph=true
> giraph.waitForRequestsConfirmation=false
> giraph.waitForPerWorkerRequests=true
>
> I know the naming for the flags here is a bit bizarre, but this sets up
> the infrastructure for message flow control which is crucial to avoid
> failures due to messages. The default strategy for the adaptive mechanism
> is threshold based. Meaning that, there are a bunch of thresholds (default
> values for the threshold are defined in ThresholdBasedOracle class) the
> system reacts to those. You should follow some (fairly easy) guidelines to
> set the proper threshold for your system. Please refer to the other email
> response in the same thread for guidelines on how to set your thresholds
> properly.
>
> Hope it helps,
> Best,
> Hassan
>
> On Tue, Nov 8, 2016 at 11:01 AM, Hai Lan <lanhai1988@gmail.com> wrote:
>
>> Hello Denis
>>
>> Thanks for your quick response.
>>
>> I just tested to set the timeout as 3600000. And it seems like superstep
>> 0 can be finished now. However, the job is killed immediately when
>> superstep 1 start. In zookeeper log:
>>
>> 2016-11-08 11:54:13,569 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 198 responses of 199 needed to start superstep 1.  Reporting every 30000 msecs, 511036 more msecs left before giving up.
>> 2016-11-08 11:54:13,570 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 13 (could be master)
>> 2016-11-08 11:54:13,571 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba30000 type:create cxid:0x14e81 zxid:0xc76 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir
>> 2016-11-08 11:54:13,571 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba30000 type:create cxid:0x14e82 zxid:0xc77 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerUnhealthyDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerUnhealthyDir
>> 2016-11-08 11:54:21,045 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba30000 type:create cxid:0x14f4b zxid:0xc79 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir
>> 2016-11-08 11:54:21,046 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba30000 type:create cxid:0x14f4c zxid:0xc7a txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerUnhealthyDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerUnhealthyDir
>> 2016-11-08 11:54:21,094 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 0 connections, (0 total connected) 0 failed, 0 failures total.
>> 2016-11-08 11:54:21,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.partition.PartitionBalancer: balancePartitionsAcrossWorkers: Using algorithm static
>> 2016-11-08 11:54:21,097 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: [Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=22, port=30022):(v=48825003, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=51, port=30051):(v=48825003, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=87, port=30087):(v=48824999, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=99, port=30099):(v=48824999, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=159, port=30159):(v=48824999, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=189, port=30189):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=166, port=30166):(v=48824999, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=172, port=30172):(v=48824999, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=195, port=30195):(v=48824999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=116, port=30116):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=154, port=30154):(v=48824999, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=2, port=30002):(v=58590001, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=123, port=30123):(v=48824999, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=52, port=30052):(v=48825001, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=188, port=30188):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=165, port=30165):(v=48824999, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=171, port=30171):(v=48824999, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=23, port=30023):(v=48825003, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=117, port=30117):(v=48824999, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=20, port=30020):(v=48825003, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=89, port=30089):(v=48824999, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=53, port=30053):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=168, port=30168):(v=48824999, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=187, port=30187):(v=48824999, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=179, port=30179):(v=48824999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=118, port=30118):(v=48824999, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=75, port=30075):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=152, port=30152):(v=48824999, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=21, port=30021):(v=48825003, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=88, port=30088):(v=48824999, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=180, port=30180):(v=48824999, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=54, port=30054):(v=48824999, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=76, port=30076):(v=48824999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=119, port=30119):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=167, port=30167):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=153, port=30153):(v=48824999, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=196, port=30196):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=170, port=30170):(v=48824999, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=103, port=30103):(v=48824999, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=156, port=30156):(v=48824999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=120, port=30120):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=150, port=30150):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=67, port=30067):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=59, port=30059):(v=48824999, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=84, port=30084):(v=48824999, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=19, port=30019):(v=48825003, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=102, port=30102):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=169, port=30169):(v=48824999, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=34, port=30034):(v=48825003, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=162, port=30162):(v=48824999, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=157, port=30157):(v=48824999, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=83, port=30083):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=151, port=30151):(v=48824999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=121, port=30121):(v=48824999, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=131, port=30131):(v=48824999, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=101, port=30101):(v=48824999, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=161, port=30161):(v=48824999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=122, port=30122):(v=48824999, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=158, port=30158):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=148, port=30148):(v=48824999, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=86, port=30086):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=140, port=30140):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=91, port=30091):(v=48824999, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=100, port=30100):(v=48824999, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=160, port=30160):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=149, port=30149):(v=48824999, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=85, port=30085):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=139, port=30139):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=13, port=30013):(v=48825003, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=80, port=30080):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=92, port=30092):(v=48824999, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=112, port=30112):(v=48824999, e=0),Worker(hostname=trantor14.umiacs.umd.edu hostOrIp=trantor14.umiacs.umd.edu, MRtaskID=147, port=30147):(v=48824999, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=184, port=30184):(v=48824999, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=8, port=30008):(v=48825004, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=45, port=30045):(v=48825003, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=58, port=30058):(v=48824999, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=32, port=30032):(v=48825003, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=106, port=30106):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=63, port=30063):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=142, port=30142):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=71, port=30071):(v=48824999, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=40, port=30040):(v=48825003, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=130, port=30130):(v=48824999, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=39, port=30039):(v=48825003, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=111, port=30111):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=93, port=30093):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=14, port=30014):(v=48825003, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=7, port=30007):(v=48825004, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=185, port=30185):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=46, port=30046):(v=48825003, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=33, port=30033):(v=48825003, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=105, port=30105):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=62, port=30062):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=141, port=30141):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=70, port=30070):(v=48824999, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=135, port=30135):(v=48824999, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=186, port=30186):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=94, port=30094):(v=48824999, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=114, port=30114):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=43, port=30043):(v=48825003, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=30, port=30030):(v=48825003, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=128, port=30128):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=144, port=30144):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=69, port=30069):(v=48824999, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=194, port=30194):(v=48824999, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=10, port=30010):(v=48825004, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=132, port=30132):(v=48824999, e=0),Worker(hostname=trantor18.umiacs.umd.edu hostOrIp=trantor18.umiacs.umd.edu, MRtaskID=104, port=30104):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=61, port=30061):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=11, port=30011):(v=48825003, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=42, port=30042):(v=48825003, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=178, port=30178):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=12, port=30012):(v=48825003, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=134, port=30134):(v=48824999, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=113, port=30113):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=44, port=30044):(v=48825003, e=0),Worker(hostname=trantor06.umiacs.umd.edu hostOrIp=trantor06.umiacs.umd.edu, MRtaskID=155, port=30155):(v=48824999, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=31, port=30031):(v=48825003, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=68, port=30068):(v=48824999, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=9, port=30009):(v=48825004, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=95, port=30095):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=143, port=30143):(v=48824999, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=133, port=30133):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=60, port=30060):(v=48824999, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=129, port=30129):(v=48824999, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=177, port=30177):(v=48824999, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=41, port=30041):(v=48825003, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=198, port=30198):(v=48824999, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=193, port=30193):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=145, port=30145):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=66, port=30066):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=49, port=30049):(v=48825003, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=28, port=30028):(v=48825003, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=4, port=30004):(v=58590001, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=26, port=30026):(v=48825003, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=181, port=30181):(v=48824999, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=55, port=30055):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=96, port=30096):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=17, port=30017):(v=48825003, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=36, port=30036):(v=48825003, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=176, port=30176):(v=48824999, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=108, port=30108):(v=48824999, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=77, port=30077):(v=48824999, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=126, port=30126):(v=48824999, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=136, port=30136):(v=48824999, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=197, port=30197):(v=48824999, e=0),Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=90, port=30090):(v=48824999, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=29, port=30029):(v=48825003, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=192, port=30192):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=50, port=30050):(v=48825003, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=3, port=30003):(v=58590001, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=56, port=30056):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=97, port=30097):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=18, port=30018):(v=48825003, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=127, port=30127):(v=48824999, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=35, port=30035):(v=48825003, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=78, port=30078):(v=48824999, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=175, port=30175):(v=48824999, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=82, port=30082):(v=48824999, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=107, port=30107):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=74, port=30074):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=47, port=30047):(v=48825003, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=1, port=30001):(v=58589999, e=0),Worker(hostname=trantor23.umiacs.umd.edu hostOrIp=trantor23.umiacs.umd.edu, MRtaskID=115, port=30115):(v=48824999, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=38, port=30038):(v=48825003, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=110, port=30110):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=15, port=30015):(v=48825003, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=124, port=30124):(v=48824999, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=6, port=30006):(v=48825004, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=191, port=30191):(v=48824999, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=182, port=30182):(v=48824999, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=174, port=30174):(v=48824999, e=0),Worker(hostname=trantor24.umiacs.umd.edu hostOrIp=trantor24.umiacs.umd.edu, MRtaskID=98, port=30098):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=164, port=30164):(v=48824999, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=65, port=30065):(v=48824999, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=24, port=30024):(v=48825003, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=79, port=30079):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=73, port=30073):(v=48824999, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=138, port=30138):(v=48824999, e=0),Worker(hostname=trantor00.umiacs.umd.edu hostOrIp=trantor00.umiacs.umd.edu, MRtaskID=81, port=30081):(v=48824999, e=0),Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=5, port=30005):(v=58590001, e=0),Worker(hostname=trantor12.umiacs.umd.edu hostOrIp=trantor12.umiacs.umd.edu, MRtaskID=190, port=30190):(v=48824999, e=0),Worker(hostname=trantor07.umiacs.umd.edu hostOrIp=trantor07.umiacs.umd.edu, MRtaskID=27, port=30027):(v=48825003, e=0),Worker(hostname=trantor10.umiacs.umd.edu hostOrIp=trantor10.umiacs.umd.edu, MRtaskID=37, port=30037):(v=48825003, e=0),Worker(hostname=trantor04.umiacs.umd.edu hostOrIp=trantor04.umiacs.umd.edu, MRtaskID=199, port=30199):(v=48824999, e=0),Worker(hostname=trantor22.umiacs.umd.edu hostOrIp=trantor22.umiacs.umd.edu, MRtaskID=146, port=30146):(v=48824999, e=0),Worker(hostname=trantor08.umiacs.umd.edu hostOrIp=trantor08.umiacs.umd.edu, MRtaskID=57, port=30057):(v=48824999, e=0),Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=trantor02.umiacs.umd.edu, MRtaskID=48, port=30048):(v=48825003, e=0),Worker(hostname=trantor05.umiacs.umd.edu hostOrIp=trantor05.umiacs.umd.edu, MRtaskID=183, port=30183):(v=48824999, e=0),Worker(hostname=trantor13.umiacs.umd.edu hostOrIp=trantor13.umiacs.umd.edu, MRtaskID=173, port=30173):(v=48824999, e=0),Worker(hostname=trantor20.umiacs.umd.edu hostOrIp=trantor20.umiacs.umd.edu, MRtaskID=25, port=30025):(v=48825003, e=0),Worker(hostname=trantor01.umiacs.umd.edu hostOrIp=trantor01.umiacs.umd.edu, MRtaskID=64, port=30064):(v=48824999, e=0),Worker(hostname=trantor03.umiacs.umd.edu hostOrIp=trantor03.umiacs.umd.edu, MRtaskID=16, port=30016):(v=48825003, e=0),Worker(hostname=trantor15.umiacs.umd.edu hostOrIp=trantor15.umiacs.umd.edu, MRtaskID=125, port=30125):(v=48824999, e=0),Worker(hostname=trantor16.umiacs.umd.edu hostOrIp=trantor16.umiacs.umd.edu, MRtaskID=109, port=30109):(v=48824999, e=0),Worker(hostname=trantor21.umiacs.umd.edu hostOrIp=trantor21.umiacs.umd.edu, MRtaskID=163, port=30163):(v=48824999, e=0),Worker(hostname=trantor09.umiacs.umd.edu hostOrIp=trantor09.umiacs.umd.edu, MRtaskID=72, port=30072):(v=48824999, e=0),Worker(hostname=trantor19.umiacs.umd.edu hostOrIp=trantor19.umiacs.umd.edu, MRtaskID=137, port=30137):(v=48824999, e=0),]
>> 2016-11-08 11:54:21,098 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: Vertices - Mean: 49070351, Min: Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=87, port=30087) - 48824999, Max: Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=5, port=30005) - 58590001
>> 2016-11-08 11:54:21,098 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.partition.PartitionUtils: analyzePartitionStats: Edges - Mean: 0, Min: Worker(hostname=trantor17.umiacs.umd.edu hostOrIp=trantor17.umiacs.umd.edu, MRtaskID=87, port=30087) - 0, Max: Worker(hostname=trantor11.umiacs.umd.edu hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=5, port=30005) - 0
>> 2016-11-08 11:54:21,104 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 199 workers finished on superstep 1 on path /_hadoopBsp/job_1477020594559_0051/_applicationAttemptsDir/0/_superstepDir/1/_workerFinishedDir
>> 2016-11-08 11:54:29,090 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_applicationAttemptKey":-1,"_stateKey":"FAILED","_superstepKey":-1} on superstep 1
>> 2016-11-08 11:54:29,094 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba30044 type:create cxid:0x1b zxid:0xd46 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_masterJobState
>> 2016-11-08 11:54:29,094 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_applicationAttemptKey":-1,"_stateKey":"FAILED","_superstepKey":-1}
>> 2016-11-08 11:54:29,096 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba3004a type:create cxid:0x1b zxid:0xd47 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_masterJobState
>> 2016-11-08 11:54:29,096 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba3004f type:create cxid:0x1b zxid:0xd48 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_masterJobState
>> 2016-11-08 11:54:29,096 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15844b61ba30054 type:create cxid:0x1b zxid:0xd49 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1477020594559_0051/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1477020594559_0051/_masterJobState
>> 2016-11-08 11:54:29,096 FATAL [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: failJob: Killing job job_1477020594559_0051
>>
>>
>> Any other ideas?
>>
>>
>> Thanks,
>>
>>
>> BR,
>>
>>
>> Hai
>>
>>
>>
>>
>>
>> On Tue, Nov 8, 2016 at 9:48 AM, Denis Dudinski <denis.dudinski@gmail.com>
>> wrote:
>>
>>> Hi Hai,
>>>
>>> I think we saw something like this in our environment.
>>>
>>> Interesting row is this one:
>>> 2016-10-27 19:04:00,000 INFO [SessionTracker]
>>> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>>> 0x158084f5b2100b8, timeout of 600000ms exceeded
>>>
>>> I think that one of workers due to some reason did not communicate
>>> with ZooKeeper for quite a long time (it may be heavy network load or
>>> high CPU consumption, see your monitoring infrastructure, it should
>>> give you a hint). ZooKeeper session expires and all ephemeral nodes
>>> for that worker in ZooKeeper tree are deleted. Master thinks that
>>> worker is dead and halts computation.
>>>
>>> Your ZooKeeper session timeout is 600000 ms which is 10 minutes. We
>>> set this value to much more higher value equal to 1 hour and were able
>>> to perform computations successfully.
>>>
>>> I hope it will help in your case too.
>>>
>>> Best Regards,
>>> Denis Dudinski
>>>
>>> 2016-11-08 16:43 GMT+03:00 Hai Lan <lanhai1988@gmail.com>:
>>> > Hi Guys
>>> >
>>> > The OutOfMemoryError might be solved be adding
>>> > "-Dmapreduce.map.memory.mb=14848". But in my tests, I found some more
>>> > problems during running out of core graph.
>>> >
>>> > I did two tests with 150G 10^10 vertices input in 1.2 version, and it
>>> seems
>>> > like it not necessary to add like
>>> > "giraph.userPartitionCount=1000,giraph.maxPartitionsInMemory=1" cause
>>> it is
>>> > adaptive. However, If I run without setting "userPartitionCount and
>>> > maxPartitionsInMemory", it will it will keep running on superstep -1
>>> > forever. None of worker can finish superstep -1. And I can see a warn
>>> in
>>> > zookeeper log, not sure if it is the problem:
>>> >
>>> > WARN [netty-client-worker-3]
>>> > org.apache.giraph.comm.netty.handler.ResponseClientHandler:
>>> exceptionCaught:
>>> > Channel failed with remote address
>>> > trantor21.umiacs.umd.edu/192.168.74.221:30172
>>> > java.lang.ArrayIndexOutOfBoundsException: 1075052544
>>> >       at
>>> > org.apache.giraph.comm.flow_control.NoOpFlowControl.getAckSi
>>> gnalFlag(NoOpFlowControl.java:52)
>>> >       at
>>> > org.apache.giraph.comm.netty.NettyClient.messageReceived(Net
>>> tyClient.java:796)
>>> >       at
>>> > org.apache.giraph.comm.netty.handler.ResponseClientHandler.c
>>> hannelRead(ResponseClientHandler.java:87)
>>> >       at
>>> > io.netty.channel.DefaultChannelHandlerContext.invokeChannelR
>>> ead(DefaultChannelHandlerContext.java:338)
>>> >       at
>>> > io.netty.channel.DefaultChannelHandlerContext.fireChannelRea
>>> d(DefaultChannelHandlerContext.java:324)
>>> >       at
>>> > io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>>> ToMessageDecoder.java:153)
>>> >       at
>>> > io.netty.channel.DefaultChannelHandlerContext.invokeChannelR
>>> ead(DefaultChannelHandlerContext.java:338)
>>> >       at
>>> > io.netty.channel.DefaultChannelHandlerContext.fireChannelRea
>>> d(DefaultChannelHandlerContext.java:324)
>>> >       at
>>> > org.apache.giraph.comm.netty.InboundByteCounter.channelRead(
>>> InboundByteCounter.java:74)
>>> >       at
>>> > io.netty.channel.DefaultChannelHandlerContext.invokeChannelR
>>> ead(DefaultChannelHandlerContext.java:338)
>>> >       at
>>> > io.netty.channel.DefaultChannelHandlerContext.fireChannelRea
>>> d(DefaultChannelHandlerContext.java:324)
>>> >       at
>>> > io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>>> ultChannelPipeline.java:785)
>>> >       at
>>> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re
>>> ad(AbstractNioByteChannel.java:126)
>>> >       at
>>> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>>> tLoop.java:485)
>>> >       at
>>> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>>> ed(NioEventLoop.java:452)
>>> >       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
>>> >       at
>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>> gleThreadEventExecutor.java:101)
>>> >       at java.lang.Thread.run(Thread.java:745)
>>> >
>>> >
>>> >
>>> > If I add giraph.userPartitionCount=1000,giraph.maxPartitionsInMemory=
>>> 1.
>>> > Whole command is :
>>> >
>>> > hadoop jar
>>> > /home/hlan/giraph-1.2.0-hadoop2/giraph-examples/target/girap
>>> h-examples-1.2.0-hadoop2-for-hadoop-2.6.0-jar-with-dependencies.jar
>>> > org.apache.giraph.GiraphRunner -Dgiraph.useOutOfCoreGraph=true
>>> > -Ddigraph.block_factory_configurators=org.apache.giraph.conf
>>> .FacebookConfiguration
>>> > -Dmapreduce.map.memory.mb=14848 org.apache.giraph.examples.myTask -vif
>>> > org.apache.giraph.examples.LongFloatNullTextInputFormat -vip
>>> > /user/hlan/cube/tmp/out/ -vof
>>> > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> > /user/hlan/output -w 199 -ca
>>> > mapred.job.tracker=localhost:5431,steps=6,giraph.isStaticGra
>>> ph=true,giraph.numInputThreads=10,giraph.userPartitionCount=
>>> 1000,giraph.maxPartitionsInMemory=1
>>> >
>>> > the job will be pass superstep -1 very quick (around 10 mins). But it
>>> will
>>> > be killed near end of superstep 0.
>>> >
>>> > 2016-10-27 18:53:56,607 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.partition.PartitionUtils: analyzePartitionStats:
>>> Vertices
>>> > - Mean: 9810049, Min: Worker(hostname=trantor11.umiacs.umd.edu
>>> > hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=10, port=30010) -
>>> 9771533, Max:
>>> > Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=
>>> trantor02.umiacs.umd.edu,
>>> > MRtaskID=49, port=30049) - 9995724
>>> > 2016-10-27 18:53:56,608 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.partition.PartitionUtils: analyzePartitionStats:
>>> Edges -
>>> > Mean: 0, Min: Worker(hostname=trantor11.umiacs.umd.edu
>>> > hostOrIp=trantor11.umiacs.umd.edu, MRtaskID=10, port=30010) - 0, Max:
>>> > Worker(hostname=trantor02.umiacs.umd.edu hostOrIp=
>>> trantor02.umiacs.umd.edu,
>>> > MRtaskID=49, port=30049) - 0
>>> > 2016-10-27 18:53:56,634 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:54:26,638 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:54:56,640 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:55:26,641 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:55:56,642 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:56:26,643 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:56:56,644 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:57:26,645 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:57:56,646 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:58:26,647 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:58:56,675 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:59:26,676 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 18:59:56,677 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:00:26,678 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:00:56,679 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:01:26,680 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:01:29,610 WARN [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: caught end of stream
>>> exception
>>> > EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> > 0x158084f5b2100c6, likely client has closed socket
>>> > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn
>>> .java:220)
>>> > at
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
>>> erCnxnFactory.java:208)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > 2016-10-27 19:01:29,612 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> for
>>> > client /192.168.74.212:53136 which had sessionid 0x158084f5b2100c6
>>> > 2016-10-27 19:01:31,702 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
>>> connection
>>> > from /192.168.74.212:56696
>>> > 2016-10-27 19:01:31,711 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Client attempting to
>>> renew
>>> > session 0x158084f5b2100c6 at /192.168.74.212:56696
>>> > 2016-10-27 19:01:31,712 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Established session
>>> > 0x158084f5b2100c6 with negotiated timeout 600000 for client
>>> > /192.168.74.212:56696
>>> > 2016-10-27 19:01:56,681 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:02:20,029 WARN [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: caught end of stream
>>> exception
>>> > EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> > 0x158084f5b2100c5, likely client has closed socket
>>> > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn
>>> .java:220)
>>> > at
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
>>> erCnxnFactory.java:208)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > 2016-10-27 19:02:20,030 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> for
>>> > client /192.168.74.212:53134 which had sessionid 0x158084f5b2100c5
>>> > 2016-10-27 19:02:21,584 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
>>> connection
>>> > from /192.168.74.212:56718
>>> > 2016-10-27 19:02:21,608 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Client attempting to
>>> renew
>>> > session 0x158084f5b2100c5 at /192.168.74.212:56718
>>> > 2016-10-27 19:02:21,608 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Established session
>>> > 0x158084f5b2100c5 with negotiated timeout 600000 for client
>>> > /192.168.74.212:56718
>>> > 2016-10-27 19:02:26,682 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:02:56,683 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:03:05,743 WARN [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: caught end of stream
>>> exception
>>> > EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> > 0x158084f5b2100b9, likely client has closed socket
>>> > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn
>>> .java:220)
>>> > at
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
>>> erCnxnFactory.java:208)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > 2016-10-27 19:03:05,744 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> for
>>> > client /192.168.74.203:51130 which had sessionid 0x158084f5b2100b9
>>> > 2016-10-27 19:03:07,452 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
>>> connection
>>> > from /192.168.74.203:54676
>>> > 2016-10-27 19:03:07,493 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Client attempting to
>>> renew
>>> > session 0x158084f5b2100b9 at /192.168.74.203:54676
>>> > 2016-10-27 19:03:07,494 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Established session
>>> > 0x158084f5b2100b9 with negotiated timeout 600000 for client
>>> > /192.168.74.203:54676
>>> > 2016-10-27 19:03:26,684 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:03:53,712 WARN [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: caught end of stream
>>> exception
>>> > EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> > 0x158084f5b2100be, likely client has closed socket
>>> > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn
>>> .java:220)
>>> > at
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServ
>>> erCnxnFactory.java:208)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > 2016-10-27 19:03:53,713 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> for
>>> > client /192.168.74.203:51146 which had sessionid 0x158084f5b2100be
>>> > 2016-10-27 19:03:55,436 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
>>> connection
>>> > from /192.168.74.203:54694
>>> > 2016-10-27 19:03:55,482 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Client attempting to
>>> renew
>>> > session 0x158084f5b2100be at /192.168.74.203:54694
>>> > 2016-10-27 19:03:55,483 INFO [NIOServerCxn.Factory:0.0.0.0/
>>> 0.0.0.0:22181]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Established session
>>> > 0x158084f5b2100be with negotiated timeout 600000 for client
>>> > /192.168.74.203:54694
>>> > 2016-10-27 19:03:56,719 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
>>> of 199
>>> > workers finished on superstep 0 on path
>>> > /_hadoopBsp/job_1477020594559_0049/_applicationAttemptsDir/0
>>> /_superstepDir/0/_workerFinishedDir
>>> > 2016-10-27 19:04:00,000 INFO [SessionTracker]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>>> > 0x158084f5b2100b8, timeout of 600000ms exceeded
>>> > 2016-10-27 19:04:00,001 INFO [SessionTracker]
>>> > org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>>> > 0x158084f5b2100c2, timeout of 600000ms exceeded
>>> > 2016-10-27 19:04:00,002 INFO [ProcessThread(sid:0 cport:-1):]
>>> > org.apache.zookeeper.server.PrepRequestProcessor: Processed session
>>> > termination for sessionid: 0x158084f5b2100b8
>>> > 2016-10-27 19:04:00,002 INFO [ProcessThread(sid:0 cport:-1):]
>>> > org.apache.zookeeper.server.PrepRequestProcessor: Processed session
>>> > termination for sessionid: 0x158084f5b2100c2
>>> > 2016-10-27 19:04:00,004 INFO [SyncThread:0]
>>> > org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> for
>>> > client /192.168.74.203:51116 which had sessionid 0x158084f5b2100b8
>>> > 2016-10-27 19:04:00,006 INFO [SyncThread:0]
>>> > org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>> for
>>> > client /192.168.74.212:53128 which had sessionid 0x158084f5b2100c2
>>> > 2016-10-27 19:04:00,033 INFO [org.apache.giraph.master.MasterThread]
>>> > org.apache.giraph.master.BspServiceMaster: setJobState:
>>> > {"_applicationAttemptKey":-1,"_stateKey":"FAILED","_superstepKey":-1}
>>> on
>>> > superstep 0
>>> >
>>> > Any Idea about this?
>>> >
>>> > Thanks,
>>> >
>>> > Hai
>>> >
>>> >
>>> > On Tue, Nov 8, 2016 at 6:37 AM, Denis Dudinski <
>>> denis.dudinski@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi Xenia,
>>> >>
>>> >> Thank you! I'll check the thread you mentioned.
>>> >>
>>> >> Best Regards,
>>> >> Denis Dudinski
>>> >>
>>> >> 2016-11-08 14:16 GMT+03:00 Xenia Demetriou <xeniad20@gmail.com>:
>>> >> > Hi Denis,
>>> >> >
>>> >> > For the "java.lang.OutOfMemoryError: GC overhead limit exceeded"
>>> error
>>> >> > I
>>> >> > hope that the  conversation in below link can help you.
>>> >> >  www.mail-archive.com/user@giraph.apache.org/msg02938.html
>>> >> >
>>> >> > Regards,
>>> >> > Xenia
>>> >> >
>>> >> > 2016-11-08 12:25 GMT+02:00 Denis Dudinski <denis.dudinski@gmail.com
>>> >:
>>> >> >>
>>> >> >> Hi Hassan,
>>> >> >>
>>> >> >> Thank you for really quick response!
>>> >> >>
>>> >> >> I changed "giraph.isStaticGraph" to false and the error
>>> disappeared.
>>> >> >> As expected iteration became slow and wrote to disk edges once
>>> again
>>> >> >> in superstep 1.
>>> >> >>
>>> >> >> However, the computation failed at superstep 2 with error
>>> >> >> "java.lang.OutOfMemoryError: GC overhead limit exceeded". It seems
>>> to
>>> >> >> be unrelated to "isStaticGraph" issue, but I think it worth
>>> mentioning
>>> >> >> to see the picture as a whole.
>>> >> >>
>>> >> >> Are there any other tests/information I am able to execute/check to
>>> >> >> help to pinpoint "isStaticGraph" problem?
>>> >> >>
>>> >> >> Best Regards,
>>> >> >> Denis Dudinski
>>> >> >>
>>> >> >>
>>> >> >> 2016-11-07 20:00 GMT+03:00 Hassan Eslami <hsn.eslami@gmail.com>:
>>> >> >> > Hi Denis,
>>> >> >> >
>>> >> >> > Thanks for bringing up the issue. In the previous conversation
>>> >> >> > thread,
>>> >> >> > the
>>> >> >> > similar problem is reported even with a simpler example connected
>>> >> >> > component
>>> >> >> > calculation. Although, back then, we were developing other
>>> >> >> > performance-critical components of OOC.
>>> >> >> >
>>> >> >> > Let's debug this issue together to make the new OOC more stable.
>>> I
>>> >> >> > suspect
>>> >> >> > the problem is with "giraph.isStaticGraph=true" (as this is only
>>> an
>>> >> >> > optimization and most of our end-to-end testing was on cases
>>> where
>>> >> >> > the
>>> >> >> > graph
>>> >> >> > could change). Let's get rid of it for now and see if the problem
>>> >> >> > still
>>> >> >> > exists.
>>> >> >> >
>>> >> >> > Best,
>>> >> >> > Hassan
>>> >> >> >
>>> >> >> > On Mon, Nov 7, 2016 at 6:24 AM, Denis Dudinski
>>> >> >> > <denis.dudinski@gmail.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hello,
>>> >> >> >>
>>> >> >> >> We are trying to calculate PageRank on huge graph, which does
>>> not
>>> >> >> >> fit
>>> >> >> >> into memory. For calculation to succeed we tried to turn on
>>> >> >> >> OutOfCore
>>> >> >> >> feature of Giraph, but every launch we tried resulted in
>>> >> >> >> com.esotericsoftware.kryo.KryoException: Buffer underflow.
>>> >> >> >> Each time it happens on different servers but exactly right
>>> after
>>> >> >> >> start of superstep 1.
>>> >> >> >>
>>> >> >> >> We are using Giraph 1.2.0 on hadoop 2.7.3 (our prod version,
>>> can't
>>> >> >> >> back-step to Giraph's officially supported version and had to
>>> patch
>>> >> >> >> Giraph a little) placed on 11 servers + 3 master servers
>>> (namenodes
>>> >> >> >> etc.) with separate ZooKeeper cluster deployment.
>>> >> >> >>
>>> >> >> >> Our launch command:
>>> >> >> >>
>>> >> >> >> hadoop jar /opt/giraph-1.2.0/pr-job-jar-with-dependencies.jar
>>> >> >> >> org.apache.giraph.GiraphRunner
>>> >> >> >> com.prototype.di.pr.PageRankComputation
>>> >> >> >> \
>>> >> >> >> -mc com.prototype.di.pr.PageRankMasterCompute \
>>> >> >> >> -yj pr-job-jar-with-dependencies.jar \
>>> >> >> >> -vif com.belprime.di.pr.input.HBLongVertexInputFormat \
>>> >> >> >> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>>> >> >> >> -op /user/hadoop/output/pr_test \
>>> >> >> >> -w 10 \
>>> >> >> >> -c com.prototype.di.pr.PRDoubleCombiner \
>>> >> >> >> -wc com.prototype.di.pr.PageRankWorkerContext \
>>> >> >> >> -ca hbase.rootdir=hdfs://namenode1.webmeup.com:8020/hbase \
>>> >> >> >> -ca giraph.logLevel=info \
>>> >> >> >> -ca hbase.mapreduce.inputtable=di_test \
>>> >> >> >> -ca hbase.mapreduce.scan.columns=di:n \
>>> >> >> >> -ca hbase.defaults.for.version.skip=true \
>>> >> >> >> -ca hbase.table.row.textkey=false \
>>> >> >> >> -ca giraph.yarn.task.heap.mb=48000 \
>>> >> >> >> -ca giraph.isStaticGraph=true \
>>> >> >> >> -ca giraph.SplitMasterWorker=false \
>>> >> >> >> -ca giraph.oneToAllMsgSending=true \
>>> >> >> >> -ca giraph.metrics.enable=true \
>>> >> >> >> -ca giraph.jmap.histo.enable=true \
>>> >> >> >> -ca
>>> >> >> >> giraph.vertexIdClass=com.prototype.di.pr.DomainPartAwareLong
>>> Writable
>>> >> >> >> \
>>> >> >> >> -ca
>>> >> >> >> giraph.outgoingMessageValueClass=org.apache.hadoop.io.Double
>>> Writable
>>> >> >> >> \
>>> >> >> >> -ca
>>> >> >> >> giraph.inputOutEdgesClass=org.apache.giraph.edge.LongNullArr
>>> ayEdges
>>> >> >> >> \
>>> >> >> >> -ca giraph.useOutOfCoreGraph=true \
>>> >> >> >> -ca giraph.waitForPerWorkerRequests=true \
>>> >> >> >> -ca giraph.maxNumberOfUnsentRequests=1000 \
>>> >> >> >> -ca
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> giraph.vertexInputFilterClass=com.prototype.di.pr.input.Page
>>> sFromSameDomainLimiter
>>> >> >> >> \
>>> >> >> >> -ca giraph.useInputSplitLocality=true \
>>> >> >> >> -ca hbase.mapreduce.scan.cachedrows=10000 \
>>> >> >> >> -ca giraph.minPartitionsPerComputeThread=60 \
>>> >> >> >> -ca
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> giraph.graphPartitionerFactoryClass=com.prototype.di.pr.Doma
>>> inAwareGraphPartitionerFactory
>>> >> >> >> \
>>> >> >> >> -ca giraph.numInputThreads=1 \
>>> >> >> >> -ca giraph.inputSplitSamplePercent=20 \
>>> >> >> >> -ca giraph.pr.maxNeighborsPerVertex=50 \
>>> >> >> >> -ca
>>> >> >> >> giraph.partitionClass=org.apache.giraph.partition.ByteArrayP
>>> artition
>>> >> >> >> \
>>> >> >> >> -ca giraph.vertexClass=org.apache.giraph.graph.ByteValueVertex
>>> \
>>> >> >> >> -ca
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> giraph.partitionsDirectory=/disk1/_bsp/_partitions,/disk2/_b
>>> sp/_partitions
>>> >> >> >>
>>> >> >> >> Logs excerpt:
>>> >> >> >>
>>> >> >> >> 16/11/06 15:47:15 INFO pr.PageRankWorkerContext: Pre superstep
>>> in
>>> >> >> >> worker
>>> >> >> >> context
>>> >> >> >> 16/11/06 15:47:15 INFO graph.GraphTaskManager: execute: 60
>>> >> >> >> partitions
>>> >> >> >> to process with 1 compute thread(s), originally 1 thread(s) on
>>> >> >> >> superstep 1
>>> >> >> >> 16/11/06 15:47:15 INFO ooc.OutOfCoreEngine: startIteration:
>>> with 60
>>> >> >> >> partitions in memory and 1 active threads
>>> >> >> >> 16/11/06 15:47:15 INFO pr.PageRankComputation: Pre superstep1
>>> in PR
>>> >> >> >> computation
>>> >> >> >> 16/11/06 15:47:15 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.75
>>> >> >> >> 16/11/06 15:47:16 INFO ooc.OutOfCoreEngine:
>>> >> >> >> updateActiveThreadsFraction: updating the number of active
>>> threads
>>> >> >> >> to
>>> >> >> >> 1
>>> >> >> >> 16/11/06 15:47:16 INFO policy.ThresholdBasedOracle:
>>> >> >> >> updateRequestsCredit: updating the credit to 20
>>> >> >> >> 16/11/06 15:47:17 INFO graph.GraphTaskManager:
>>> installGCMonitoring:
>>> >> >> >> name = PS Scavenge, action = end of minor GC, cause = Allocation
>>> >> >> >> Failure, duration = 937ms
>>> >> >> >> 16/11/06 15:47:17 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.72
>>> >> >> >> 16/11/06 15:47:18 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.74
>>> >> >> >> 16/11/06 15:47:18 INFO ooc.OutOfCoreEngine:
>>> >> >> >> updateActiveThreadsFraction: updating the number of active
>>> threads
>>> >> >> >> to
>>> >> >> >> 1
>>> >> >> >> 16/11/06 15:47:18 INFO policy.ThresholdBasedOracle:
>>> >> >> >> updateRequestsCredit: updating the credit to 20
>>> >> >> >> 16/11/06 15:47:19 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.76
>>> >> >> >> 16/11/06 15:47:19 INFO ooc.OutOfCoreEngine:
>>> doneProcessingPartition:
>>> >> >> >> processing partition 234 is done!
>>> >> >> >> 16/11/06 15:47:20 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.79
>>> >> >> >> 16/11/06 15:47:21 INFO ooc.OutOfCoreEngine:
>>> >> >> >> updateActiveThreadsFraction: updating the number of active
>>> threads
>>> >> >> >> to
>>> >> >> >> 1
>>> >> >> >> 16/11/06 15:47:21 INFO policy.ThresholdBasedOracle:
>>> >> >> >> updateRequestsCredit: updating the credit to 18
>>> >> >> >> 16/11/06 15:47:21 INFO handler.RequestDecoder: decode: Server
>>> window
>>> >> >> >> metrics MBytes/sec received = 1.0994, MBytesReceived = 33.0459,
>>> ave
>>> >> >> >> received req MBytes = 0.0138, secs waited = 30.058
>>> >> >> >> 16/11/06 15:47:21 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.82
>>> >> >> >> 16/11/06 15:47:21 INFO ooc.OutOfCoreIOCallable: call: thread 0's
>>> >> >> >> next
>>> >> >> >> IO command is: StorePartitionIOCommand: (partitionId = 234)
>>> >> >> >> 16/11/06 15:47:21 INFO ooc.OutOfCoreIOCallable: call: thread 0's
>>> >> >> >> command StorePartitionIOCommand: (partitionId = 234) completed:
>>> >> >> >> bytes=
>>> >> >> >> 64419740, duration=351, bandwidth=175.03, bandwidth (excluding
>>> GC
>>> >> >> >> time)=175.03
>>> >> >> >> 16/11/06 15:47:21 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.83
>>> >> >> >> 16/11/06 15:47:21 INFO ooc.OutOfCoreIOCallable: call: thread 0's
>>> >> >> >> next
>>> >> >> >> IO command is: StoreIncomingMessageIOCommand: (partitionId =
>>> 234)
>>> >> >> >> 16/11/06 15:47:21 INFO ooc.OutOfCoreIOCallable: call: thread 0's
>>> >> >> >> command StoreIncomingMessageIOCommand: (partitionId = 234)
>>> >> >> >> completed:
>>> >> >> >> bytes= 0, duration=0, bandwidth=NaN, bandwidth (excluding GC
>>> >> >> >> time)=NaN
>>> >> >> >> 16/11/06 15:47:21 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.83
>>> >> >> >> 16/11/06 15:47:40 INFO graph.GraphTaskManager:
>>> installGCMonitoring:
>>> >> >> >> name = PS Scavenge, action = end of minor GC, cause = Allocation
>>> >> >> >> Failure, duration = 3107ms
>>> >> >> >> 16/11/06 15:47:40 INFO graph.GraphTaskManager:
>>> installGCMonitoring:
>>> >> >> >> name = PS MarkSweep, action = end of major GC, cause =
>>> Ergonomics,
>>> >> >> >> duration = 15064ms
>>> >> >> >> 16/11/06 15:47:40 INFO ooc.OutOfCoreEngine:
>>> >> >> >> updateActiveThreadsFraction: updating the number of active
>>> threads
>>> >> >> >> to
>>> >> >> >> 1
>>> >> >> >> 16/11/06 15:47:40 INFO policy.ThresholdBasedOracle:
>>> >> >> >> updateRequestsCredit: updating the credit to 20
>>> >> >> >> 16/11/06 15:47:40 INFO policy.ThresholdBasedOracle:
>>> >> >> >> getNextIOActions:
>>> >> >> >> usedMemoryFraction = 0.71
>>> >> >> >> 16/11/06 15:47:40 INFO ooc.OutOfCoreIOCallable: call: thread 0's
>>> >> >> >> next
>>> >> >> >> IO command is: LoadPartitionIOCommand: (partitionId = 234,
>>> superstep
>>> >> >> >> =
>>> >> >> >> 2)
>>> >> >> >> JMap histo dump at Sun Nov 06 15:47:41 CET 2016
>>> >> >> >> 16/11/06 15:47:41 INFO ooc.OutOfCoreEngine:
>>> doneProcessingPartition:
>>> >> >> >> processing partition 364 is done!
>>> >> >> >> 16/11/06 15:47:48 INFO ooc.OutOfCoreEngine:
>>> >> >> >> updateActiveThreadsFraction: updating the number of active
>>> threads
>>> >> >> >> to
>>> >> >> >> 1
>>> >> >> >> 16/11/06 15:47:48 INFO policy.ThresholdBasedOracle:
>>> >> >> >> updateRequestsCredit: updating the credit to 20
>>> >> >> >> --
>>> >> >> >> -- num     #instances         #bytes  class name
>>> >> >> >> -- ----------------------------------------------
>>> >> >> >> --   1:     224004229    10752202992
>>> >> >> >> java.util.concurrent.ConcurrentHashMap$Node
>>> >> >> >> --   2:      19751666     6645730528  [B
>>> >> >> >> --   3:     222135985     5331263640
>>> >> >> >> com.belprime.di.pr.DomainPartAwareLongWritable
>>> >> >> >> --   4:     214686483     5152475592
>>> >> >> >> org.apache.hadoop.io.DoubleWritable
>>> >> >> >> --   5:           353     4357261784
>>> >> >> >> [Ljava.util.concurrent.ConcurrentHashMap$Node;
>>> >> >> >> --   6:        486266      204484688  [I
>>> >> >> >> --   7:       6017652      192564864
>>> >> >> >> org.apache.giraph.utils.UnsafeByteArrayOutputStream
>>> >> >> >> --   8:       3986203      159448120
>>> >> >> >> org.apache.giraph.utils.UnsafeByteArrayInputStream
>>> >> >> >> --   9:       2064182      148621104
>>> >> >> >> org.apache.giraph.graph.ByteValueVertex
>>> >> >> >> --  10:       2064182       82567280
>>> >> >> >> org.apache.giraph.edge.ByteArrayEdges
>>> >> >> >> --  11:       1886875       45285000  java.lang.Integer
>>> >> >> >> --  12:        349409       30747992
>>> >> >> >> java.util.concurrent.ConcurrentHashMap$TreeNode
>>> >> >> >> --  13:        916970       29343040  java.util.Collections$1
>>> >> >> >> --  14:        916971       22007304
>>> >> >> >> java.util.Collections$SingletonSet
>>> >> >> >> --  15:         47270        3781600
>>> >> >> >> java.util.concurrent.ConcurrentHashMap$TreeBin
>>> >> >> >> --  16:         26201        2590912  [C
>>> >> >> >> --  17:         34175        1367000
>>> >> >> >> org.apache.giraph.edge.ByteArrayEdges$ByteArrayEdgeIterator
>>> >> >> >> --  18:          6143        1067704  java.lang.Class
>>> >> >> >> --  19:         25953         830496  java.lang.String
>>> >> >> >> --  20:         34175         820200
>>> >> >> >> org.apache.giraph.edge.EdgeNoValue
>>> >> >> >> --  21:          4488         703400  [Ljava.lang.Object;
>>> >> >> >> --  22:            70         395424
>>> >> >> >> [Ljava.nio.channels.SelectionKey;
>>> >> >> >> --  23:          2052         328320  java.lang.reflect.Method
>>> >> >> >> --  24:          6600         316800
>>> >> >> >> org.apache.giraph.utils.ByteArrayVertexIdMessages
>>> >> >> >> --  25:          5781         277488  java.util.HashMap$Node
>>> >> >> >> --  26:          5651         271248  java.util.Hashtable$Entry
>>> >> >> >> --  27:          6604         211328
>>> >> >> >> org.apache.giraph.factories.DefaultMessageValueFactory
>>> >> >> >> 16/11/06 15:47:49 ERROR utils.LogStacktraceCallable: Execution
>>> of
>>> >> >> >> callable failed
>>> >> >> >> java.lang.RuntimeException: call: execution of IO command
>>> >> >> >> LoadPartitionIOCommand: (partitionId = 234, superstep = 2)
>>> failed!
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCa
>>> llable.java:115)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCa
>>> llable.java:36)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.utils.LogStacktraceCallable.call(LogStackt
>>> raceCallable.java:67)
>>> >> >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>> >> >> >> at java.lang.Thread.run(Thread.java:745)
>>> >> >> >> Caused by: com.esotericsoftware.kryo.KryoException: Buffer
>>> >> >> >> underflow.
>>> >> >> >> at com.esotericsoftware.kryo.io.Input.require(Input.java:199)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >> com.esotericsoftware.kryo.io.UnsafeInput.readLong(UnsafeInpu
>>> t.java:112)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> com.esotericsoftware.kryo.io.KryoDataInput.readLong(KryoData
>>> Input.java:91)
>>> >> >> >> at
>>> >> >> >> org.apache.hadoop.io.LongWritable.readFields(LongWritable.ja
>>> va:47)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutE
>>> dges(DiskBackedPartitionStore.java:245)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMe
>>> moryPartitionData(DiskBackedPartitionStore.java:278)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedDataStore.loadPartition
>>> DataProxy(DiskBackedDataStore.java:234)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPart
>>> itionData(DiskBackedPartitionStore.java:311)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.command.LoadPartitionIOCommand.execute
>>> (LoadPartitionIOCommand.java:66)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCa
>>> llable.java:99)
>>> >> >> >> ... 6 more
>>> >> >> >> 16/11/06 15:47:49 FATAL graph.GraphTaskManager:
>>> uncaughtException:
>>> >> >> >> OverrideExceptionHandler on thread ooc-io-0, msg = call:
>>> execution
>>> >> >> >> of
>>> >> >> >> IO command LoadPartitionIOCommand: (partitionId = 234,
>>> superstep =
>>> >> >> >> 2)
>>> >> >> >> failed!, exiting...
>>> >> >> >> java.lang.RuntimeException: call: execution of IO command
>>> >> >> >> LoadPartitionIOCommand: (partitionId = 234, superstep = 2)
>>> failed!
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCa
>>> llable.java:115)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCa
>>> llable.java:36)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.utils.LogStacktraceCallable.call(LogStackt
>>> raceCallable.java:67)
>>> >> >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>> >> >> >> at java.lang.Thread.run(Thread.java:745)
>>> >> >> >> Caused by: com.esotericsoftware.kryo.KryoException: Buffer
>>> >> >> >> underflow.
>>> >> >> >> at com.esotericsoftware.kryo.io.Input.require(Input.java:199)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >> com.esotericsoftware.kryo.io.UnsafeInput.readLong(UnsafeInpu
>>> t.java:112)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> com.esotericsoftware.kryo.io.KryoDataInput.readLong(KryoData
>>> Input.java:91)
>>> >> >> >> at
>>> >> >> >> org.apache.hadoop.io.LongWritable.readFields(LongWritable.ja
>>> va:47)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutE
>>> dges(DiskBackedPartitionStore.java:245)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMe
>>> moryPartitionData(DiskBackedPartitionStore.java:278)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedDataStore.loadPartition
>>> DataProxy(DiskBackedDataStore.java:234)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPart
>>> itionData(DiskBackedPartitionStore.java:311)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.command.LoadPartitionIOCommand.execute
>>> (LoadPartitionIOCommand.java:66)
>>> >> >> >> at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCa
>>> llable.java:99)
>>> >> >> >> ... 6 more
>>> >> >> >> 16/11/06 15:47:49 ERROR worker.BspServiceWorker:
>>> unregisterHealth:
>>> >> >> >> Got
>>> >> >> >> failure, unregistering health on
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> /_hadoopBsp/giraph_yarn_application_1478342673283_0009/_appl
>>> icationAttemptsDir/0/_superstepDir/1/_workerHealthyDir/
>>> datanode6.webmeup.com_5
>>> >> >> >> on superstep 1
>>> >> >> >>
>>> >> >> >> We looked into one thread
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> http://mail-archives.apache.org/mod_mbox/giraph-user/201607.
>>> mbox/%3CCAECWHa3MOqubf8--wMVhzqOYwwZ0ZuP6_iiqTE_xT%3DoLJAAPQ
>>> w%40mail.gmail.com%3E
>>> >> >> >> but it is rather old and at that time the answer was "do not
>>> use it
>>> >> >> >> yet".
>>> >> >> >> (see reply
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> http://mail-archives.apache.org/mod_mbox/giraph-user/201607.
>>> mbox/%3CCAH1LQfdbpbZuaKsu1b7TCwOzGMxi_vf9vYi6Xg_Bp8o43H7u%2B
>>> w%40mail.gmail.com%3E).
>>> >> >> >> Does it hold today? We would like to use new advanced adaptive
>>> OOC
>>> >> >> >> approach if possible...
>>> >> >> >>
>>> >> >> >> Thank you in advance, any help or hint would be really
>>> appreciated.
>>> >> >> >>
>>> >> >> >> Best Regards,
>>> >> >> >> Denis Dudinski
>>> >> >> >
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>>
>>
>

Mime
View raw message