giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Primitives vs Objects (the Movie!)
Date Wed, 07 Sep 2011 19:51:50 GMT
On Wed, Sep 7, 2011 at 6:33 AM, Avery Ching <> wrote:
>    I'm not sure that this is precisely the right API, but exposing the
> inner SortedMap definitely has a "leaky abstraction" smell to it to me,
> especially where there are no examples or algorithm implementations in the
> codebase which currently *need* this to be exposed in such a way.
> We could expose just the Map and provide helper methods as you suggest.
> It's a thought.  You're right that it's not used in this way in any
> examples.

Maybe a few more examples would help?  Cases where you want to do a BSP
computation where the total sort (both the vertexes, and the edges for each
vertex) is required, as is the random access nature of the Map?

> I can see how the Edge object is nice, it does seem like a better api then
> basically just a Map<I,E>.  Why not a sorted List<Edge<I,E>>, then,
> the Comparator ignores the edge value, but only sorts by the dest vertexId?
>  Because the contract of List doesn't *require* the sort order to remain
> fixed, I guess?  This yet again underscores for me the "spidey-sense"
> wrongness of exposing the SortedMap exactly as it is.  Maybe just expose the
> methods on a map that are expected to be needed, and methods can be added as
> time goes on?
> I think Map over List makes sense for the edges to find them quickly (my
> personal opinion).

Yeah, if there are cases where you're doing searches for paths, this is
probably true, some form of random access is probably best.

What about something as simple as:

   * @return an immutable, sorted view of the target vertices
  public ImmutableList<I> getTargetVertices();

   * @return E edge for the supplied vertex (or null if there isn't an edge
to this vertex from the current one)
  public <E> Edge<I,E> getEdge(I vertex);

in conjunction with the already existing

  public boolean addEdge(Edge<I,E> edge);

you have all the actions of a sorted, random-access map, but without
exposing the way it's stored internally.  Why is there addEdge() but not

>    Problem 1)  static mutable state (ew!) - lots of references to
> Vertex.setSuperStep(), Vertex.setNumVertices(), Vertex.setNumEdges(),
> Vertex.setGraphMapper(), etc... which means that implemeting BasicVertex /
> MutableVertex isn't good enough, you have to actually be a subclass of
> Vertex! :\
> Oh, you found our dirty little secret. =)  Yes, we do have static mutable
> state.  The concern is that if every vertex actually keeps track of all
> those things as members, I think a decent amount of memory would be wasted.
> I'm sure there is a cleaner way to implement this though (i.e. Flyweight
> pattern perhaps), just never got around to fixing it.

Yeah, keeping one extra 4 or 8 bytes of a reference to say, the GraphMapper
inside of each Vertex, and having the GraphMapper itself centralize all of
these generic Vertex attributes might be the way to go.  I'll see if I can
whip that up as part of my primitive patch.

>    Problem 2) Vertex.class is directly referenced as the base class in
> places like BspUtils, etc.
> We never expected to have another base Vertex class. =)

True, but in many of these places, it's just the interface type which is
required, right?  BaseVertex is what they all implement, is there a reason
why this can't be what's referred to in these cases?

> Most likely the easiest solution would be to pull out a AbstractVertex
> class, which holds this ugly static state, and make said state protected,
> instead of private, and have Vertex and LongDoubleFloatDoubleVertex both
> subclass it, and change reflection references from setClass(VERTEX_CLASS,
> vertexClass, Vertex.class); to setClass(VERTEX_CLASS, vertexClass,
> AbstractVertex.class);
> That sounds reasonable.  One thing we have thought about is compressing the
> vertices in memory at the cost of some CPU.  While this is orthogonal, they
> are both efforts to conserve memory.

Storing the edges as primitive arrays could be viewed as a fairly maximally
efficient compression, in cases where you have primitive-able types.  I'm
not sure how easy it would be to have a generically compressible
SortedMap<I, Edge<E,I>>, in a way which allowed mutations of the map...

> Memory consumption (see above), these are aggregate members for all the
> vertices.

Ok, I'll see what it looks like if this data is moved to something like a
VertexState object attached to the GraphMapper, which all the vertexes can
have a reference to.

View raw message