giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Primitives vs Objects (the Movie!)
Date Wed, 07 Sep 2011 06:33:20 GMT
On 9/6/11 10:05 PM, Jake Mannix wrote:
>
>
> On Wed, Sep 7, 2011 at 1:02 AM, Avery Ching <aching@apache.org 
> <mailto:aching@apache.org>> wrote:
>
>>
>>     Yeah, one edge is pretty silly.  To get some real numbers, I
>>     should try it out with a more realistic (power-law distributed)
>>     bit of synthetic data.
>     Agreed.
>
>
> I'll see if I can write up some simple extensions of the PageRank 
> benchmark to test this out.  Added 
> https://issues.apache.org/jira/browse/GIRAPH-26 to track it.
>
>     Lots of questions here, I'll try my best. =)  Basically the idea
>     is that as a sorted map, the user would be able to know that the
>     edges are sorted by their destination ids.  Range based edge
>     queries are possible (i.e. the edges with destinations from
>     com.yahoo.www to com.zillow.www).  Basically this would give the
>     users a bit more functionality than a basic map.  A list would
>     require a full scan to find/remove an edge.
>
>
> I see, that makes sense.  I wonder if inverting the API a bit would 
> make sense, instead of exposing such a concrete inner domain object 
> like the actual SortedMap?  By this, I mean: if you want to search / 
> subselect edges by some criterion, then instead of having the caller do
>
>   for(Edge<E,V> edge : getOutEdgeMap().values()) {
>     // select edges you want to do something with
>   }
>
>   you expose to the caller:
>
>   public Collection<Edge<E,V>> filterEdges(Predicate<Edge<E,V>>

> filter), which the caller can use like
>
>   for(Edge<E,V> edge : filterEdges(new Predicate<Edge<E,V>>() {
>     boolean apply(Edge<E,V> e) {
>       return e.getVertexId().compareTo("com.zillow.www") >= 0 && 
> e.getVertexId().compareTo("com.yahoo.www") < 0;
>     }
>   }) {
>     // do stuff on the edges you wanted to get
>   }
>
>   I'm not sure that this is precisely the right API, but exposing the 
> inner SortedMap definitely has a "leaky abstraction" smell to it to 
> me, especially where there are no examples or algorithm 
> implementations in the codebase which currently *need* this to be 
> exposed in such a way.
We could expose just the Map and provide helper methods as you suggest.  
It's a thought.  You're right that it's not used in this way in any 
examples.

>     As far as why not (Sorted)Map<I, E>, I believe we actually used to
>     have that interface and I changed it use the Edge class.  This was
>     primarily done since Edge objects would be used for edge requests
>     (i.e. void addEdgeRequest(I sourceVertexId, Edge<I, E> edge)
>     stores the add edge request with the Edge object in a list.  I
>     think I also changed the user interfaces to use the Edge object
>     since I initially thought it might make usage it a little clearer
>     for users (i.e. edge.getDestVertexid() and some of the
>     serialization a little simpler, but looking at it now, that might
>     not be the case.  We can probably go back to Map<I, E> or
>     SortedMap<I, E> to save some memory and internally use the Edge
>     object to store the add edge requests.
>
>
> I can see how the Edge object is nice, it does seem like a better api 
> then basically just a Map<I,E>.  Why not a sorted List<Edge<I,E>>,

> then, where the Comparator ignores the edge value, but only sorts by 
> the dest vertexId?  Because the contract of List doesn't *require* the 
> sort order to remain fixed, I guess?  This yet again underscores for 
> me the "spidey-sense" wrongness of exposing the SortedMap exactly as 
> it is.  Maybe just expose the methods on a map that are expected to be 
> needed, and methods can be added as time goes on?
I think Map over List makes sense for the edges to find them quickly (my 
personal opinion).

>>     For my present case, I can probably hack around it by making a
>>     virtual
>>     SortedMap<LongWritable, Edge<LongWritable,FloatWritable>> which
>>     implements that interface, yet is backed by a
>>     primitive map, but I'm trying to understand what the API is
>>     trying to support...
>     Hope my explanation helped somewhat.
>
>
> Yes, definitely.  But I've run into some ugly ugliness when I tried to 
> run my code:
>
>   Problem 1)  static mutable state (ew!) - lots of references to 
> Vertex.setSuperStep(), Vertex.setNumVertices(), Vertex.setNumEdges(), 
> Vertex.setGraphMapper(), etc... which means that implemeting 
> BasicVertex / MutableVertex isn't good enough, you have to actually be 
> a subclass of Vertex! :\
>
Oh, you found our dirty little secret. =)  Yes, we do have static 
mutable state.  The concern is that if every vertex actually keeps track 
of all those things as members, I think a decent amount of memory would 
be wasted.  I'm sure there is a cleaner way to implement this though 
(i.e. Flyweight pattern perhaps), just never got around to fixing it.
>   Problem 2) Vertex.class is directly referenced as the base class in 
> places like BspUtils, etc.
>
We never expected to have another base Vertex class. =)
> So some refactoring may be necessary to even try out this Object -> 
> primitive test (well, my code *compiles*, but won't run on account of 
> problems 1 and 2 above - runtime exceptions galore), even after I 
> "faked" a SortedMap by implementing a slimmed down delegator class.
>
> Most likely the easiest solution would be to pull out a AbstractVertex 
> class, which holds this ugly static state, and make said state 
> protected, instead of private, and have Vertex and 
> LongDoubleFloatDoubleVertex both subclass it, and change reflection 
> references from setClass(VERTEX_CLASS, vertexClass, Vertex.class); 
> to setClass(VERTEX_CLASS, vertexClass, AbstractVertex.class);
>
That sounds reasonable.  One thing we have thought about is compressing 
the vertices in memory at the cost of some CPU.  While this is 
orthogonal, they are both efforts to conserve memory.
> But the *right* solution is to remove that static stuff.  Why is it 
> there, anyways?  Why doesn't GraphMapper just call
>             Vertex.setSuperstep(superstep);
>             Vertex.setNumVertices(serviceWorker.getTotalVertices());
>             Vertex.setNumEdges(serviceWorker.getTotalEdges());
> directly on each of the BasicVertex's 
> in serviceWorker.getVertexRangeMap()?
>
Memory consumption (see above), these are aggregate members for all the 
vertices.
>   -jake


Mime
View raw message