giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Primitives vs Objects (the Movie!)
Date Wed, 07 Sep 2011 19:51:50 GMT
On Wed, Sep 7, 2011 at 6:33 AM, Avery Ching <aching@apache.org> wrote:
>
>    I'm not sure that this is precisely the right API, but exposing the
> inner SortedMap definitely has a "leaky abstraction" smell to it to me,
> especially where there are no examples or algorithm implementations in the
> codebase which currently *need* this to be exposed in such a way.
>
> We could expose just the Map and provide helper methods as you suggest.
> It's a thought.  You're right that it's not used in this way in any
> examples.
>

Maybe a few more examples would help?  Cases where you want to do a BSP
computation where the total sort (both the vertexes, and the edges for each
vertex) is required, as is the random access nature of the Map?

> I can see how the Edge object is nice, it does seem like a better api then
> basically just a Map<I,E>.  Why not a sorted List<Edge<I,E>>, then,
where
> the Comparator ignores the edge value, but only sorts by the dest vertexId?
>  Because the contract of List doesn't *require* the sort order to remain
> fixed, I guess?  This yet again underscores for me the "spidey-sense"
> wrongness of exposing the SortedMap exactly as it is.  Maybe just expose the
> methods on a map that are expected to be needed, and methods can be added as
> time goes on?
>
> I think Map over List makes sense for the edges to find them quickly (my
> personal opinion).
>

Yeah, if there are cases where you're doing searches for paths, this is
probably true, some form of random access is probably best.

What about something as simple as:

  /*
   * @return an immutable, sorted view of the target vertices
   */
  public ImmutableList<I> getTargetVertices();

  /*
   * @return E edge for the supplied vertex (or null if there isn't an edge
to this vertex from the current one)
   */
  public <E> Edge<I,E> getEdge(I vertex);

in conjunction with the already existing

  public boolean addEdge(Edge<I,E> edge);

you have all the actions of a sorted, random-access map, but without
exposing the way it's stored internally.  Why is there addEdge() but not
removeEdge()?


>    Problem 1)  static mutable state (ew!) - lots of references to
> Vertex.setSuperStep(), Vertex.setNumVertices(), Vertex.setNumEdges(),
> Vertex.setGraphMapper(), etc... which means that implemeting BasicVertex /
> MutableVertex isn't good enough, you have to actually be a subclass of
> Vertex! :\
>
> Oh, you found our dirty little secret. =)  Yes, we do have static mutable
> state.  The concern is that if every vertex actually keeps track of all
> those things as members, I think a decent amount of memory would be wasted.
> I'm sure there is a cleaner way to implement this though (i.e. Flyweight
> pattern perhaps), just never got around to fixing it.
>

Yeah, keeping one extra 4 or 8 bytes of a reference to say, the GraphMapper
inside of each Vertex, and having the GraphMapper itself centralize all of
these generic Vertex attributes might be the way to go.  I'll see if I can
whip that up as part of my primitive patch.


>    Problem 2) Vertex.class is directly referenced as the base class in
> places like BspUtils, etc.
>
> We never expected to have another base Vertex class. =)
>

True, but in many of these places, it's just the interface type which is
required, right?  BaseVertex is what they all implement, is there a reason
why this can't be what's referred to in these cases?

> Most likely the easiest solution would be to pull out a AbstractVertex
> class, which holds this ugly static state, and make said state protected,
> instead of private, and have Vertex and LongDoubleFloatDoubleVertex both
> subclass it, and change reflection references from setClass(VERTEX_CLASS,
> vertexClass, Vertex.class); to setClass(VERTEX_CLASS, vertexClass,
> AbstractVertex.class);
>
> That sounds reasonable.  One thing we have thought about is compressing the
> vertices in memory at the cost of some CPU.  While this is orthogonal, they
> are both efforts to conserve memory.
>

Storing the edges as primitive arrays could be viewed as a fairly maximally
efficient compression, in cases where you have primitive-able types.  I'm
not sure how easy it would be to have a generically compressible
SortedMap<I, Edge<E,I>>, in a way which allowed mutations of the map...


> Memory consumption (see above), these are aggregate members for all the
> vertices.
>

Ok, I'll see what it looks like if this data is moved to something like a
VertexState object attached to the GraphMapper, which all the vertexes can
have a reference to.

Mime
View raw message