Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 32463DBE7 for ; Fri, 17 Aug 2012 15:18:16 +0000 (UTC) Received: (qmail 56482 invoked by uid 500); 17 Aug 2012 15:18:16 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 56446 invoked by uid 500); 17 Aug 2012 15:18:16 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 56434 invoked by uid 99); 17 Aug 2012 15:18:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 15:18:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of claudio.martella@gmail.com designates 209.85.213.180 as permitted sender) Received: from [209.85.213.180] (HELO mail-yx0-f180.google.com) (209.85.213.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 15:18:09 +0000 Received: by yenl8 with SMTP id l8so4300301yen.11 for ; Fri, 17 Aug 2012 08:17:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=AeixA6c5Ljko9m956r1kNqh77vZpDWINKea75b1Ll+E=; b=Az1ktSsOVMDLD5d41LPLEr3Knrp/0LX5ObNxqQuawdAzag+0mEZcYrvYN+0DF9cNtZ MvoDZPLNuJrrH5lVTH4+a/m9obBjITO4+j1bdb/S2GHDsFcC7z0qKwEZmSixoUf4GWOs 46mmvjtm9i8j64aYRkjwzchrJFOJeabQ16znyvvmAzF3QiuYjTVEMyK4B+CK2CtRNIRq nFgP24BySKx04nCpFn1xgCQ5lwypl++di9+7nmtdQLaFA8U52eIRC1YMtS+N1+w7VKZe PhVX6Z6X3m3JxhV5lQGGPmVrChun+tWEWc05smkWvUc/sdN/HXmBLadnhg3yxNKcbMCc I8wg== MIME-Version: 1.0 Received: by 10.50.41.195 with SMTP id h3mr2357652igl.34.1345216666666; Fri, 17 Aug 2012 08:17:46 -0700 (PDT) Received: by 10.50.61.33 with HTTP; Fri, 17 Aug 2012 08:17:46 -0700 (PDT) In-Reply-To: References: Date: Fri, 17 Aug 2012 17:17:46 +0200 Message-ID: Subject: Re: [jira] [Created] (GIRAPH-249) Move part of the graph out-of-core when memory is low From: Claudio Martella To: "dev@giraph.apache.org" Content-Type: multipart/alternative; boundary=14dae934051d90061b04c777a931 --14dae934051d90061b04c777a931 Content-Type: text/plain; charset=ISO-8859-1 Yes, that is definitely the direction you may want to take at a certain moment. That is basically what Stanford gps does as well, and stratosphere too. On Friday, August 17, 2012, Alessandro Presta wrote: > I think at that point it would be worth having a new logical place for > vertex/edge representation at worker- or partition-level. > Avery had some ideas about this. > > Basically right now we're giving the user the freedom (and responsibility) > to choose a representation (both in-memory and for serialization), but > another way to go would be to take care of all that at infrastructure > level and expose only one Vertex class (where the user only defines the > computation details and everything else is abstracted away). Then we could > play around with compact representations and even more disruptive > strategies (like streaming the graph/messages and re-using objects). > > On 8/17/12 2:30 PM, "Gianmarco De Francisci Morales" > > > wrote: > > >I was under the impression that 100k was the upper limit to make things > >work without crashing. > > > >In any case, if one wanted to use a compressed memory representation by > >aggregating different edge lists together, could one use the worker > >context > >as a central point of access to the compressed graphs? > >I can imagine a vertex class that has only the ID and uses the worker > >context to access its edge list (i.e. it is only a client to a central > >per-machine repository). > >Vertexes in the same partition would share this data structure. > > > >Is there any obvious technical fallacy in this scheme? > > > >Cheers, > >-- > >Gianmarco > > > > > > > >On Fri, Aug 17, 2012 at 3:18 PM, Alessandro Presta > >wrote: > > > >> The example where we actually go out of memory was with 500K vertices > >>and > >> 500M edges, but yes, as a general rule we should strive to reduce our > >> memory footprint in order to push the point where we need to go out of > >> core as far away as possible. > >> > >> On 8/17/12 2:11 PM, "Gianmarco De Francisci Morales" > >> wrote: > >> > >> >Very interesting. > >> > > >> >On a side note, a graph with 100k vertices and 100M edges is largish > >>but > >> >not that big after all. > >> >If it does not fit on 10+ GB of memory, it means that each edge > >>occupies > >> >around 100B (amortizing the cost of the vertex over the edges). > >> >In my opinion this deserves some thought. > >> >If memory is an issue, why not think about compressed memory > >>structures, > >> >at > >> >least for common graph formats? > >> > > >> >Cheers, > >> >-- > >> >Gianmarco > >> > > >> > > >> > > >> >On Wed, Aug 15, 2012 at 11:20 PM, Eli Reisman > >> >wrote: > >> > > >> >> Great metrics, this made a very interesting read, and great code too > >>as > >> >> always. This must have been a lot of work. I like the idea of > >> >>eliminating > >> >> the extra temporary storage data structures where possible, even when > >> >>not > >> >> going out-of-core. I think that + avoiding extra object creation > >>during > >> >>the > >> >> workflow can still do a lot for in-core job's memory profile, but > >>this > >> >>is > >> >> looking really good and sounds like with the config options its also > >> >> pluggable depending on your hardware situation, so it sounds great to > >> >>me. > >> >> Great work! > >> >> > >> >> On Wed, Aug 15, 2012 at 12:23 PM, Alessandro Presta (JIRA) > >> >> wrote: > >> >> > >> >> > > >> >> > [ > >> >> > > >> >> > >> > >>>> > https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jir > >>>>a > >> . > >> > >>>>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435437 > >>>>#c > >> >>omment-13435437 > >> >> ] > >> >> > > >> >> > Alessandro Presta commented on GIRAPH-249: > >> >> > ------------------------------------------ > >> >> > > >> >> > Thanks Claudio, good observation. > >> >> > You got me curious so I quickly ran a shortest paths benchmark. > >> >> > > >> >> > 500k vertices, 100 edges/vertex, 10 workers > >> >> > > >> >> > This is with trunk: > >> >> > > >> >> > {code} > >> >> > hadoop jar giraph-trunk.jar > >> >> > org.apache.giraph.benchmark.ShortestPathsBenchmark > >> >>-Dgiraph.useN -- Claudio Martella claudio.martella@gmail.com --14dae934051d90061b04c777a931--