giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: LongDoubleFloatDoubleVertex
Date Fri, 01 Mar 2013 08:33:10 GMT
On Fri, Mar 1, 2013 at 5:13 AM, Alessandro Presta <alessandro@fb.com> wrote:

> On 2/28/13 5:08 PM, "Jake Mannix" <jake.mannix@gmail.com> wrote:
>
> >On Thu, Feb 28, 2013 at 4:18 PM, Alessandro Presta
> ><alessandro@fb.com>wrote:
> >
> >> It's not like it causes problems, it's just that it's a pretty big
> >> dependency to justify for a small example.
> >>
> >> As for the motivation, if your point is to prove the framework's
> >> superiority in some context, then you can use the simplest possible
> >> implementation (ArrayList).
> >>
> >
> >This takes up LOTS of memory.  Primitives rule, objects drool (memory).
>
> Ok, but having to copy the keys and values to external arrays (which is
> what you have to do with Mahout's hash map) is even worse. A good
> implementation of (long, double) edges (e.g. for RandomWalkVertex) is
> primitive arrays.
>
> >
> >
> >> The Giraph framework is all about iterating over edges, so an
> >> implementation that doesn't support that with reasonable efficiency
> >> doesn't make a lot of sense to me.
> >>
> >> It also follows that hash map-based implementations only make sense for
> >> algorithms that make use of mutations.
> >>
> >
> >Agreed, maps aren't absolutely necessary for immutable graphs.  But random
> >access to collections of integers _is_ necessary for many algorithms.
> >It's
> >not all just iteration over lists.
>
> Can you give me some concrete examples of algorithms where random access
> to the edges is required?
> I'm really interested in this, because I'm considering killing
> getEdgeValue() if there are no use cases.
>

My example would be typed/labeled graphs (e.g. RDF) for reasoning and
traversals based on particular patterns/path.


>
> >
> >
> >> That said, something like a Trove hash map would probably be more
> >> appropriate (more efficient than the standard Java HashMap, at the
> >>expense
> >> of generality). That could be a good candidate for a
> >> LongDoubleHashMapEdges implementation.
> >> I can give that a shot if it sounds good.
> >>
> >
> >Trove is LGPL, IIRC, so that doesn't work in Apache projects.
>
> Whoops, totally missed that part.
>
> >
> >What does Giraph depend on of Mahout now?  Just mahout-collections?
> >That's
> >not a very big dependency, and has all sorts of primitive-to-primitive
> >collections,
> >and from what I've seen benchmarked, just as good or better than Trove.
> > Carrot2's
> >hppc may be better yet, but I'm not sure if that is stably released yet.
>
> I'll take a look at HPPC and Mahout collections then. They all seem to
> provide the same stuff, so I'll consider benchmarks and convenience of the
> API. Thanks for the pointers.
>
> >
> >
> >>
> >> On 2/28/13 4:03 PM, "Jake Mannix" <jake.mannix@gmail.com> wrote:
> >>
> >> >Is the mahout dependency causing problems?
> >> >
> >> >It would be nice if we could actually implement some of the algorithms
> >> >that
> >> >Mahout does via map-reduce in Giraph's BSP formalism, to show off how
> >>it
> >> >improves things.  Using the Mahout primitives can show that it's not
> >>about
> >> >the inner loop implementation, but the framework itself...
> >> >
> >> >
> >> >On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman
> >> ><apache.mailbox@gmail.com>wrote:
> >> >
> >> >> I like the idea of refactoring it into something more appropriate
> >>for us
> >> >> and ditching the Mahout dep. Good looking out.
> >> >>
> >> >>
> >> >> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella <
> >> >> claudio.martella@gmail.com> wrote:
> >> >>
> >> >> > I agree, at this point we could have a RandomWalkVertex with edge
> >> >>values,
> >> >> > and a "null-edged" vertex for the PR benchmarks.
> >> >> > We make everybody happy and avoid code duplication.
> >> >> >
> >> >> >
> >> >> > On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta
> >><alessandro@fb.com
> >> >> > >wrote:
> >> >> >
> >> >> > > Hi Gianmarco,
> >> >> > >
> >> >> > > Yes, there will be more efficient implementations.
> >> >> > > In the redesign I'm working on (GIRAPH-528), there will be
only
> >>one
> >> >> > Vertex
> >> >> > > class and edge storage is delegated to a VertexEdges class.
> >> >> > > So far I'm adding some generic implementations (ByteArrayEdges,
> >> >> > > ArrayListEdges, HashMapEdges) that work for all types, and
some
> >> >> optimized
> >> >> > > ones (LongDoubleArrayEdges, LongNullArrayEdges).
> >> >> > >
> >> >> > > Do you specifically need edge values to be float while the
other
> >> >>types
> >> >> > are
> >> >> > > double?
> >> >> > > It seems to me it would make sense to change RandomWalkVertex
to
> >>use
> >> >> > > double edge values instead, and avoid code duplication (i.e.
> >>adding
> >> >>a
> >> >> > > LongFloatArrayEdges that's basically the same). We're not
Trove
> >> >>after
> >> >> > all.
> >> >> > > Makes sense?
> >> >> > >
> >> >> > > Thanks for the feedback,
> >> >> > >
> >> >> > > Alessandro
> >> >> > >
> >> >> > >
> >> >> > > On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales"
> >> >><gdfm@apache.org>
> >> >> > > wrote:
> >> >> > >
> >> >> > > >Hi,
> >> >> > > >
> >> >> > > >Maybe the specific implementation can be thrown away,
but
> >> >>personally I
> >> >> > > >feel
> >> >> > > >very strongly for the need of a good LongDoubleFloatDouble
> >>vertex.
> >> >> > > >It's the base for any serious random walk algorithm.
> >> >> > > >
> >> >> > > >I would call for a refactoring rather than a removal.
> >> >> > > >
> >> >> > > >Just my 2c.
> >> >> > > >
> >> >> > > >Cheers,
> >> >> > > >
> >> >> > > >--
> >> >> > > >Gianmarco
> >> >> > > >
> >> >> > > >
> >> >> > > >On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta
> >> >> > > ><alessandro@fb.com>wrote:
> >> >> > > >
> >> >> > > >> Hi all,
> >> >> > > >>
> >> >> > > >> Does anyone feel strongly for LongDoubleFloatDoubleVertex?
> >> >> > > >> Reasons why I think it should be removed:
> >> >> > > >>
> >> >> > > >>   1.  Right now it's incorrect (returns target vertex
id as
> >>edge
> >> >> > value).
> >> >> > > >>   2.  Iteration will always be inefficient, since
the
> >>underlying
> >> >> > Mahout
> >> >> > > >> open-addressing hash map implementation doesn't
provide
> >> >>iterators.
> >> >> It
> >> >> > > >> provides a way to copy the keys and values to external
> >> >>arrays/lists.
> >> >> > > >>   3.  It's the only reason why we have Mahout as
a dependency.
> >> >> > > >>
> >> >> > > >> I think we should strive to provide model implementations
that
> >> >>are
> >> >> > > >>generic
> >> >> > > >> and/or extremely efficient. This one satisfies neither.
> >> >> > > >>
> >> >> > > >> Thanks,
> >> >> > > >>
> >> >> > > >> Alessandro
> >> >> > > >>
> >> >> > >
> >> >> > >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >    Claudio Martella
> >> >> >    claudio.martella@gmail.com
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >
> >> >  -jake
> >>
> >>
> >
> >
> >--
> >
> >  -jake
>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message