giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: [jira] [Created] (GIRAPH-249) Move part of the graph out-of-core when memory is low
Date Fri, 17 Aug 2012 15:17:46 GMT
Yes, that is definitely the direction you may want to take at a certain
moment. That is basically what Stanford gps does as well, and stratosphere
too.

On Friday, August 17, 2012, Alessandro Presta wrote:

> I think at that point it would be worth having a new logical place for
> vertex/edge representation at worker- or partition-level.
> Avery had some ideas about this.
>
> Basically right now we're giving the user the freedom (and responsibility)
> to choose a representation (both in-memory and for serialization), but
> another way to go would be to take care of all that at infrastructure
> level and expose only one Vertex class (where the user only defines the
> computation details and everything else is abstracted away). Then we could
> play around with compact representations and even more disruptive
> strategies (like streaming the graph/messages and re-using objects).
>
> On 8/17/12 2:30 PM, "Gianmarco De Francisci Morales" <gdfm@apache.org<javascript:;>
> >
> wrote:
>
> >I was under the impression that 100k was the upper limit to make things
> >work without crashing.
> >
> >In any case, if one wanted to use a compressed memory representation by
> >aggregating different edge lists together, could one use the worker
> >context
> >as a central point of access to the compressed graphs?
> >I can imagine a vertex class that has only the ID and uses the worker
> >context to access its edge list (i.e. it is only a client to a central
> >per-machine repository).
> >Vertexes in the same partition would share this data structure.
> >
> >Is there any obvious technical fallacy in this scheme?
> >
> >Cheers,
> >--
> >Gianmarco
> >
> >
> >
> >On Fri, Aug 17, 2012 at 3:18 PM, Alessandro Presta
> ><alessandro@fb.com>wrote:
> >
> >> The example where we actually go out of memory was with 500K vertices
> >>and
> >> 500M edges, but yes, as a general rule we should strive to reduce our
> >> memory footprint in order to push the point where we need to go out of
> >> core as far away as possible.
> >>
> >> On 8/17/12 2:11 PM, "Gianmarco De Francisci Morales" <gdfm@apache.org>
> >> wrote:
> >>
> >> >Very interesting.
> >> >
> >> >On a side note, a graph with 100k vertices and 100M edges is largish
> >>but
> >> >not that big after all.
> >> >If it does not fit on 10+ GB of memory, it means that each edge
> >>occupies
> >> >around 100B (amortizing the cost of the vertex over the edges).
> >> >In my opinion this deserves some thought.
> >> >If memory is an issue, why not think about compressed memory
> >>structures,
> >> >at
> >> >least for common graph formats?
> >> >
> >> >Cheers,
> >> >--
> >> >Gianmarco
> >> >
> >> >
> >> >
> >> >On Wed, Aug 15, 2012 at 11:20 PM, Eli Reisman
> >> ><initialcontext@gmail.com>wrote:
> >> >
> >> >> Great metrics, this made a very interesting read, and great code too
> >>as
> >> >> always. This must have been a lot of work. I like the idea of
> >> >>eliminating
> >> >> the extra temporary storage data structures where possible, even when
> >> >>not
> >> >> going out-of-core. I think that + avoiding extra object creation
> >>during
> >> >>the
> >> >> workflow can still do a lot for in-core job's memory profile, but
> >>this
> >> >>is
> >> >> looking really good and sounds like with the config options its also
> >> >> pluggable depending on your hardware situation, so it sounds great
to
> >> >>me.
> >> >> Great work!
> >> >>
> >> >> On Wed, Aug 15, 2012 at 12:23 PM, Alessandro Presta (JIRA)
> >> >> <jira@apache.org>wrote:
> >> >>
> >> >> >
> >> >> >     [
> >> >> >
> >> >>
> >>
> >>>>
> https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jir
> >>>>a
> >> .
> >>
> >>>>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435437
> >>>>#c
> >> >>omment-13435437
> >> >> ]
> >> >> >
> >> >> > Alessandro Presta commented on GIRAPH-249:
> >> >> > ------------------------------------------
> >> >> >
> >> >> > Thanks Claudio, good observation.
> >> >> > You got me curious so I quickly ran a shortest paths benchmark.
> >> >> >
> >> >> > 500k vertices, 100 edges/vertex, 10 workers
> >> >> >
> >> >> > This is with trunk:
> >> >> >
> >> >> > {code}
> >> >> > hadoop jar giraph-trunk.jar
> >> >> > org.apache.giraph.benchmark.ShortestPathsBenchmark
> >> >>-Dgiraph.useN



-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message