giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Primitives vs Objects (the Movie!)
Date Wed, 07 Sep 2011 05:05:00 GMT
On Wed, Sep 7, 2011 at 1:02 AM, Avery Ching <> wrote:
>  Yeah, one edge is pretty silly.  To get some real numbers, I should try
> it out with a more realistic (power-law distributed) bit of synthetic data.
> Agreed.

I'll see if I can write up some simple extensions of the PageRank benchmark
to test this out.  Added to
track it.

> Lots of questions here, I'll try my best. =)  Basically the idea is that as
> a sorted map, the user would be able to know that the edges are sorted by
> their destination ids.  Range based edge queries are possible (i.e. the
> edges with destinations from to com.zillow.www).  Basically
> this would give the users a bit more functionality than a basic map.  A list
> would require a full scan to find/remove an edge.

I see, that makes sense.  I wonder if inverting the API a bit would make
sense, instead of exposing such a concrete inner domain object like the
actual SortedMap?  By this, I mean: if you want to search / subselect edges
by some criterion, then instead of having the caller do

  for(Edge<E,V> edge : getOutEdgeMap().values()) {
    // select edges you want to do something with

  you expose to the caller:

  public Collection<Edge<E,V>> filterEdges(Predicate<Edge<E,V>> filter),
which the caller can use like

  for(Edge<E,V> edge : filterEdges(new Predicate<Edge<E,V>>() {
    boolean apply(Edge<E,V> e) {
      return e.getVertexId().compareTo("com.zillow.www") >= 0 &&
e.getVertexId().compareTo("") < 0;
  }) {
    // do stuff on the edges you wanted to get

  I'm not sure that this is precisely the right API, but exposing the inner
SortedMap definitely has a "leaky abstraction" smell to it to me, especially
where there are no examples or algorithm implementations in the codebase
which currently *need* this to be exposed in such a way.

>  As far as why not (Sorted)Map<I, E>, I believe we actually used to have
> that interface and I changed it use the Edge class.  This was primarily done
> since Edge objects would be used for edge requests (i.e. void
> addEdgeRequest(I sourceVertexId, Edge<I, E> edge) stores the add edge
> request with the Edge object in a list.  I think I also changed the user
> interfaces to use the Edge object since I initially thought it might make
> usage it a little clearer for users (i.e. edge.getDestVertexid() and some of
> the serialization a little simpler, but looking at it now, that might not be
> the case.  We can probably go back to Map<I, E> or SortedMap<I, E> to save
> some memory and internally use the Edge object to store the add edge
> requests.

I can see how the Edge object is nice, it does seem like a better api then
basically just a Map<I,E>.  Why not a sorted List<Edge<I,E>>, then, where
the Comparator ignores the edge value, but only sorts by the dest vertexId?
 Because the contract of List doesn't *require* the sort order to remain
fixed, I guess?  This yet again underscores for me the "spidey-sense"
wrongness of exposing the SortedMap exactly as it is.  Maybe just expose the
methods on a map that are expected to be needed, and methods can be added as
time goes on?

>  For my present case, I can probably hack around it by making a virtual
>  SortedMap<LongWritable, Edge<LongWritable,FloatWritable>> which
> implements that interface, yet is backed by a
> primitive map, but I'm trying to understand what the API is trying to
> support...
> Hope my explanation helped somewhat.

Yes, definitely.  But I've run into some ugly ugliness when I tried to run
my code:

  Problem 1)  static mutable state (ew!) - lots of references to
Vertex.setSuperStep(), Vertex.setNumVertices(), Vertex.setNumEdges(),
Vertex.setGraphMapper(), etc... which means that implemeting BasicVertex /
MutableVertex isn't good enough, you have to actually be a subclass of
Vertex! :\

  Problem 2) Vertex.class is directly referenced as the base class in places
like BspUtils, etc.

So some refactoring may be necessary to even try out this Object ->
primitive test (well, my code *compiles*, but won't run on account of
problems 1 and 2 above - runtime exceptions galore), even after I "faked" a
SortedMap by implementing a slimmed down delegator class.

Most likely the easiest solution would be to pull out a AbstractVertex
class, which holds this ugly static state, and make said state protected,
instead of private, and have Vertex and LongDoubleFloatDoubleVertex both
subclass it, and change reflection references from setClass(VERTEX_CLASS,
vertexClass, Vertex.class); to setClass(VERTEX_CLASS, vertexClass,

But the *right* solution is to remove that static stuff.  Why is it there,
anyways?  Why doesn't GraphMapper just call
directly on each of the BasicVertex's in serviceWorker.getVertexRangeMap()?


View raw message