Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3FB8E1049B for ; Fri, 26 Jul 2013 12:18:14 +0000 (UTC) Received: (qmail 48387 invoked by uid 500); 26 Jul 2013 12:18:14 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 48358 invoked by uid 500); 26 Jul 2013 12:18:13 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 48350 invoked by uid 99); 26 Jul 2013 12:18:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Jul 2013 12:18:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ju.han.felix@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Jul 2013 12:18:05 +0000 Received: by mail-ob0-f169.google.com with SMTP id up14so4140611obb.14 for ; Fri, 26 Jul 2013 05:17:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6ebWNrWBdQ96Tjffh01z814dYTDUTG2PaqB1MYMzQCk=; b=CSENMeOA3Wx0Gxg9EeD3q+2TGlXh9IBn7/UmUy7LGddPeRq/N1GYZT0G2o98x8ESwt exxlimTEny/27uLPktxtHxzfV+52wj0FqJGcy70A9qWOTnZ1Vd9/doOS3NJtOMJpl2T0 CGio6emIv/dqE7fmfb9vy3zfxT1/zcW6nQgAWQ/x3XN5KzHOTdq87X37SurBU/kJ4KQZ 3rmUn9xHK5RUamBfYew8FvP7nZgPtiZoTDV71xmkKdlmV1AiRiJmRLXg7WSbkuc8yWq1 5abhxux5T/Xcg1TbjlcZGmR7HX3RwczH3CvO51hoZXqtJ1IeRfJMZBTjqIm/6lLR0wlE 9sBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6ebWNrWBdQ96Tjffh01z814dYTDUTG2PaqB1MYMzQCk=; b=izPeVa0Ic52l1rmI1Q13Yem9gYpw6ISduAUya+TzzB6Z5+49N71A/dpJAKlzfZVZZ1 KDxMiotylqUYTkaElYrDu/wCDzlJIpJvk2dDfpI/pqL1MjUshEGwbhrEduDtQPApn1O2 BPIfxvfOH9WfuVvEO6X/Hl3cA5aZFg3yhJSdwYQHoHz+gt6vyvgTDcVYU30x1udgySWH rosijt6IX4/6hb5kIE5yE0caZCNGdyNxTN87dVnWbcEOp5fnHiAgNScdcVZCe23iUxmM wTscX6HvfW9vnsZxQ5BwhsaGiHOuqwZf3XfUwkHgry+DZgbB1fw+M/GEDkcY1Cj+uDB8 MyLA== MIME-Version: 1.0 X-Received: by 10.50.11.102 with SMTP id p6mr1014565igb.49.1374841064291; Fri, 26 Jul 2013 05:17:44 -0700 (PDT) Received: by 10.64.141.37 with HTTP; Fri, 26 Jul 2013 05:17:44 -0700 (PDT) In-Reply-To: References: Date: Fri, 26 Jul 2013 14:17:44 +0200 Message-ID: Subject: Re: Scaling Problem From: Han JU To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=047d7bdc1c3e42859b04e269217d X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc1c3e42859b04e269217d Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable What's your cluster configuration? How you invoke the job? 2013/7/26 jerome richard > Hi, > > I encountered a critical scaling problem using Giraph. I made a very > simple algorithm to test Giraph on large graphs : a connexity test. It > works on relatively large graphs (3 072 441 nodes and 117 185 083 edges) > but not on very large graph (52 000 000 nodes and 2 000 000 000 edges). > In fact, during the processing of the biggest graph, Giraph core seems to > fail after the superstep 14 (15 on some jobs). The input graph size is 30 > GB stored as text and the output is also stored as text. 9 working jobs a= re > used to compute the graph. > > Here is the tracktrace of jobs (this is the same for the 9 jobs): > java.lang.IllegalStateException: run: Caught an unrecoverable > exception exists: Failed to check > /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir= /97/_addressesAndPartitions > after 3 tries! > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764= ) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Unknown Source) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1093) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.IllegalStateException: exists: Failed to check > /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/_superstepDir= /97/_addressesAndPartitions > after 3 tries! > at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369= ) > at > org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker= .java:678) > at > org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:24= 8) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) > ... 7 more > > Could you help me to solve this problem? > If you need the code of the program, I can put that here (the code is > relatively tiny). > > Thanks, > J=E9r=F4me. > > --=20 *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Universit=E9 de Technologie de Compi=E8gne * **GI06 - Fouille de Donn=E9es et D=E9cisionnel* +33 0619608888 --047d7bdc1c3e42859b04e269217d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
What's your cluster configuration? How you invoke the = job?



2013/7/26 jerome richard <jeromerichard111@msn.com&= gt;
Hi,

I encountered a critical scaling problem using Girap= h. I made a very simple algorithm to test Giraph on large graphs : a connex= ity test. It works on relatively large graphs (3 072 441 nodes and 117 185 083 edge= s) but not on very large graph (52 000 000 nodes and 2 000 000 000 edges).= =A0
In fact, during the processing of the biggest graph, Giraph co= re seems to fail after the superstep 14 (15 on some jobs). The input graph = size is 30 GB stored as text and the output is also stored as text. 9 worki= ng jobs are used to compute the graph.

Here is the tracktrace of jobs (this is= the same for the 9 jobs):
=A0 =A0 java.lang.IllegalStateException: run: Caught an unreco= verable exception exists: Failed to check /_hadoopBsp/job_201307260439_0006= /_applicationAttemptsDir/0/_superstepDir/97/_addressesAndPartitions after 3= tries!
=A0 =A0 =A0 =A0 at org.apache.giraph.graph.GraphMapper.run(Gra= phMapper.java:101)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.ja= va:764)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTa= sk.run(MapTask.java:370)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.Child$4.run(Child.= java:255)
=A0 =A0 =A0 =A0 at java.security.AccessController.doPrivileged(Native Metho= d)
=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Un= known Source)
=A0 =A0 =A0 =A0 at org.apache.hadoop.security.UserGroupInforma= tion.doAs(UserGroupInformation.java:1093)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.Child.main(Child.java:249)
=A0 =A0 Caused by: java.lang.IllegalStateException: exists: F= ailed to check /_hadoopBsp/job_201307260439_0006/_applicationAttemptsDir/0/= _superstepDir/97/_addressesAndPartitions after 3 tries!
=A0 =A0 =A0 =A0 at org.apache.giraph.zk.ZooKeeperExt.exists(Zo= oKeeperExt.java:369)
=A0 =A0 =A0 =A0 at org.apache.giraph.worker.BspServiceWorker.startSuperstep= (BspServiceWorker.java:678)
=A0 =A0 =A0 =A0 at org.apache= .giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:248)
=A0 =A0 =A0 =A0 at org.apache.giraph.graph.GraphMapper.run(Gra= phMapper.java:91)
=A0 =A0 =A0 =A0 ... 7 more

Could y= ou help me to solve this problem?
If you need the code of the program, I can put that here (the = code is relatively tiny).

Thanks,=A0
J=E9r=F4me.




--
JU Han

=
Software E= ngineer Intern @ KXEN Inc.
UTC=A0=A0 - =A0Universit=E9 de Technologie de Compi=E8gne
=A0=A0=A0=A0 GI06 - Fouille de Donn=E9es et D=E9ci= sionnel

+33 0619608888
--047d7bdc1c3e42859b04e269217d--