Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F390CF75 for ; Fri, 13 Sep 2013 15:58:13 +0000 (UTC) Received: (qmail 6253 invoked by uid 500); 13 Sep 2013 09:29:08 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 6060 invoked by uid 500); 13 Sep 2013 09:28:53 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 5910 invoked by uid 99); 13 Sep 2013 09:28:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Sep 2013 09:28:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of claudio.martella@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Sep 2013 09:28:36 +0000 Received: by mail-vb0-f48.google.com with SMTP id w16so719632vbf.7 for ; Fri, 13 Sep 2013 02:28:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=ND/BG+oKjSaaTDQhffm9UB1S4hZ7RBD4Nsxh+QcVWOE=; b=ubgdmsdtfmWsc46Ycse1qnVxuPOiXrOJaSN9tWF3XJeGm/n4VrKJsLyi3MiMtTw7Cn nT60xFlBYGtHabAuEsBglpbKno3iaa+8HAZ2M47ZFAUI5e2wPnX8CtIUtxcbLgW253R5 yLfBc53DsPdAI76Ik4gTG5FRZngObwpgB4fCUfNjkr+KasmtDYIITwEuVLWgC5i4eZhW l/eYMUZyR3zkM9XgfiXCfwwq9OUg3+81C/y7vjMf5Qdy3G+DgdRkP/hj/yaNd63atj9B Gr4b3n1qRMQEbjn/gH0/LKBYssOzEyJ1O94gsEL1gJpwNBpnxpjKbafIh+Gg9+lMotzw kZUg== X-Received: by 10.52.120.78 with SMTP id la14mr9536592vdb.9.1379064495957; Fri, 13 Sep 2013 02:28:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.141.206 with HTTP; Fri, 13 Sep 2013 02:27:55 -0700 (PDT) In-Reply-To: References: From: Claudio Martella Date: Fri, 13 Sep 2013 11:27:55 +0200 Message-ID: Subject: Re: Giraph offloadPartition fails creation directory To: "user@giraph.apache.org" Content-Type: multipart/alternative; boundary=e89a8f23463567711d04e6407925 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f23463567711d04e6407925 Content-Type: text/plain; charset=ISO-8859-1 I have no idea without the logs, especially when it happens rarely. On Fri, Sep 13, 2013 at 12:33 AM, Alexander Asplund wrote: > Actually, why is it saying it fails to create directory in the first > place, when it is trying to write files? > On Sep 12, 2013 3:04 PM, "Alexander Asplund" > wrote: > >> I can also add that there is no such issue with DiskBackedMessageStore. >> It successfully creates a large number of store files, and never starts >> failing. >> On Sep 12, 2013 2:11 PM, "Alexander Asplund" >> wrote: >> >>> It's very strange.. it is definitely failing on some partitions.. >>> currently the disk size of a offloading worker corresponda about to the >>> size of its part of the graph... but the worker attempts to create >>> additional partitions, and this fails. >>> On Sep 12, 2013 2:07 PM, "Alexander Asplund" >>> wrote: >>> >>>> Actually, I take that back. It seems it does succeeded in creating >>>> partitions - it just struggles with it sometimes. Should I be worried about >>>> these errors if partition directories seem to be filling up? >>>> On Sep 11, 2013 6:38 PM, "Claudio Martella" >>>> wrote: >>>> >>>>> Giraph does not offload partitions or messages to HDFS in the >>>>> out-of-core module. It uses local disk on the computing nodes. By defualt, >>>>> it uses the tasktracker local directory where for example the distributed >>>>> cache is stored. >>>>> >>>>> Could you provide the stacktrace Giraph is spitting when failing? >>>>> >>>>> >>>>> On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund < >>>>> alexasplund@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm still trying to get Giraph to work on a graph that requires more >>>>>> memory that is available. The problem is that when the Workers try to >>>>>> offload partitions, the offloading fails. The DiskBackedPartitionStore >>>>>> fails to create the directory >>>>>> _bsp/_partitions/job-xxxx/part-vertices-xxx (roughly from recall). >>>>>> >>>>>> The input or computation will then continue for a while, which I >>>>>> believe is because it is still managing to hold everything in memory - >>>>>> but at some point it reaches the limit where there simply is no more >>>>>> heap space, and it crashes with OOM. >>>>>> >>>>>> Has anybody had this problem with giraph failing to make HDFS >>>>>> directories? >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Claudio Martella >>>>> claudio.martella@gmail.com >>>>> >>>> -- Claudio Martella claudio.martella@gmail.com --e89a8f23463567711d04e6407925 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I have no idea without the logs, especially when it happen= s rarely.


On Fri, Sep 13, 2013 at 12:33 AM, Alexander Asplund <= ;alexasplund@gma= il.com> wrote:

Actually, why is it saying it fails to cr= eate directory in the first place, when it is trying to write files?

On Sep 12, 2013 3:04 PM, "Alexander Asplund= " <alexa= splund@gmail.com> wrote:

I can also add that there is no such issue with DiskBackedMessageStore. = It successfully creates a large number of store files, and never starts fai= ling.

On Sep 12, 2013 2:11 PM, "Alexander Asplund= " <alexa= splund@gmail.com> wrote:

It's very strange.. it is definitely failing on some partitions.. cu= rrently the disk size of a offloading worker corresponda about to the size = of its part of the graph... but the worker attempts to create additional pa= rtitions, and this fails.

On Sep 12, 2013 2:07 PM, "Alexander Asplund= " <alexa= splund@gmail.com> wrote:

Actually, I take that back. It seems it does succeeded in creating parti= tions - it just struggles with it sometimes. Should I be worried about thes= e errors if partition directories seem to be filling up?

On Sep 11, 2013 6:38 PM, "Claudio Martella&= quot; <c= laudio.martella@gmail.com> wrote:
Giraph does not offload partitions or messages to HDFS in = the out-of-core module. It uses local disk on the computing nodes. By defua= lt, it uses the tasktracker local directory where for example the distribut= ed cache is stored.=A0

Could you provide the stacktrace Giraph is spitting when fai= ling?


On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund &= lt;alexasplund@g= mail.com> wrote:
Hi,

I'm still trying to get Giraph to work on a graph that requires more memory that is available. The problem is that when the Workers try to
offload partitions, the offloading fails. The DiskBackedPartitionStore
fails to create the directory
_bsp/_partitions/job-xxxx/part-vertices-xxx (roughly from recall).

The input or computation will then continue for a while, which I
believe is because it is still managing to hold everything in memory -
but at some point it reaches the limit where there simply is no more
heap space, and it crashes with OOM.

Has anybody had this problem with giraph failing to make HDFS directories?<= br>



--
=A0 =A0Clau= dio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0



--
= =A0 =A0Claudio Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0 --e89a8f23463567711d04e6407925--