giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Presta <alessan...@fb.com>
Subject Re: LongDoubleFloatDoubleVertex
Date Fri, 01 Mar 2013 08:37:03 GMT
On 3/1/13 12:33 AM, "Claudio Martella" <claudio.martella@gmail.com> wrote:

>On Fri, Mar 1, 2013 at 5:13 AM, Alessandro Presta <alessandro@fb.com>
>wrote:
>
>> On 2/28/13 5:08 PM, "Jake Mannix" <jake.mannix@gmail.com> wrote:
>>
>> >On Thu, Feb 28, 2013 at 4:18 PM, Alessandro Presta
>> ><alessandro@fb.com>wrote:
>> >
>> >> It's not like it causes problems, it's just that it's a pretty big
>> >> dependency to justify for a small example.
>> >>
>> >> As for the motivation, if your point is to prove the framework's
>> >> superiority in some context, then you can use the simplest possible
>> >> implementation (ArrayList).
>> >>
>> >
>> >This takes up LOTS of memory.  Primitives rule, objects drool (memory).
>>
>> Ok, but having to copy the keys and values to external arrays (which is
>> what you have to do with Mahout's hash map) is even worse. A good
>> implementation of (long, double) edges (e.g. for RandomWalkVertex) is
>> primitive arrays.
>>
>> >
>> >
>> >> The Giraph framework is all about iterating over edges, so an
>> >> implementation that doesn't support that with reasonable efficiency
>> >> doesn't make a lot of sense to me.
>> >>
>> >> It also follows that hash map-based implementations only make sense
>>for
>> >> algorithms that make use of mutations.
>> >>
>> >
>> >Agreed, maps aren't absolutely necessary for immutable graphs.  But
>>random
>> >access to collections of integers _is_ necessary for many algorithms.
>> >It's
>> >not all just iteration over lists.
>>
>> Can you give me some concrete examples of algorithms where random access
>> to the edges is required?
>> I'm really interested in this, because I'm considering killing
>> getEdgeValue() if there are no use cases.
>>
>
>My example would be typed/labeled graphs (e.g. RDF) for reasoning and
>traversals based on particular patterns/path.
>

Can you be a bit more specific? I'm interested in knowing if this
algorithm would call getEdgeValue(id) for some target vertex id.

>
>>
>> >
>> >
>> >> That said, something like a Trove hash map would probably be more
>> >> appropriate (more efficient than the standard Java HashMap, at the
>> >>expense
>> >> of generality). That could be a good candidate for a
>> >> LongDoubleHashMapEdges implementation.
>> >> I can give that a shot if it sounds good.
>> >>
>> >
>> >Trove is LGPL, IIRC, so that doesn't work in Apache projects.
>>
>> Whoops, totally missed that part.
>>
>> >
>> >What does Giraph depend on of Mahout now?  Just mahout-collections?
>> >That's
>> >not a very big dependency, and has all sorts of primitive-to-primitive
>> >collections,
>> >and from what I've seen benchmarked, just as good or better than Trove.
>> > Carrot2's
>> >hppc may be better yet, but I'm not sure if that is stably released
>>yet.
>>
>> I'll take a look at HPPC and Mahout collections then. They all seem to
>> provide the same stuff, so I'll consider benchmarks and convenience of
>>the
>> API. Thanks for the pointers.
>>
>> >
>> >
>> >>
>> >> On 2/28/13 4:03 PM, "Jake Mannix" <jake.mannix@gmail.com> wrote:
>> >>
>> >> >Is the mahout dependency causing problems?
>> >> >
>> >> >It would be nice if we could actually implement some of the
>>algorithms
>> >> >that
>> >> >Mahout does via map-reduce in Giraph's BSP formalism, to show off
>>how
>> >>it
>> >> >improves things.  Using the Mahout primitives can show that it's not
>> >>about
>> >> >the inner loop implementation, but the framework itself...
>> >> >
>> >> >
>> >> >On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman
>> >> ><apache.mailbox@gmail.com>wrote:
>> >> >
>> >> >> I like the idea of refactoring it into something more appropriate
>> >>for us
>> >> >> and ditching the Mahout dep. Good looking out.
>> >> >>
>> >> >>
>> >> >> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella <
>> >> >> claudio.martella@gmail.com> wrote:
>> >> >>
>> >> >> > I agree, at this point we could have a RandomWalkVertex with
>>edge
>> >> >>values,
>> >> >> > and a "null-edged" vertex for the PR benchmarks.
>> >> >> > We make everybody happy and avoid code duplication.
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta
>> >><alessandro@fb.com
>> >> >> > >wrote:
>> >> >> >
>> >> >> > > Hi Gianmarco,
>> >> >> > >
>> >> >> > > Yes, there will be more efficient implementations.
>> >> >> > > In the redesign I'm working on (GIRAPH-528), there will
be
>>only
>> >>one
>> >> >> > Vertex
>> >> >> > > class and edge storage is delegated to a VertexEdges
class.
>> >> >> > > So far I'm adding some generic implementations
>>(ByteArrayEdges,
>> >> >> > > ArrayListEdges, HashMapEdges) that work for all types,
and
>>some
>> >> >> optimized
>> >> >> > > ones (LongDoubleArrayEdges, LongNullArrayEdges).
>> >> >> > >
>> >> >> > > Do you specifically need edge values to be float while
the
>>other
>> >> >>types
>> >> >> > are
>> >> >> > > double?
>> >> >> > > It seems to me it would make sense to change RandomWalkVertex
>>to
>> >>use
>> >> >> > > double edge values instead, and avoid code duplication
(i.e.
>> >>adding
>> >> >>a
>> >> >> > > LongFloatArrayEdges that's basically the same). We're
not
>>Trove
>> >> >>after
>> >> >> > all.
>> >> >> > > Makes sense?
>> >> >> > >
>> >> >> > > Thanks for the feedback,
>> >> >> > >
>> >> >> > > Alessandro
>> >> >> > >
>> >> >> > >
>> >> >> > > On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales"
>> >> >><gdfm@apache.org>
>> >> >> > > wrote:
>> >> >> > >
>> >> >> > > >Hi,
>> >> >> > > >
>> >> >> > > >Maybe the specific implementation can be thrown away,
but
>> >> >>personally I
>> >> >> > > >feel
>> >> >> > > >very strongly for the need of a good LongDoubleFloatDouble
>> >>vertex.
>> >> >> > > >It's the base for any serious random walk algorithm.
>> >> >> > > >
>> >> >> > > >I would call for a refactoring rather than a removal.
>> >> >> > > >
>> >> >> > > >Just my 2c.
>> >> >> > > >
>> >> >> > > >Cheers,
>> >> >> > > >
>> >> >> > > >--
>> >> >> > > >Gianmarco
>> >> >> > > >
>> >> >> > > >
>> >> >> > > >On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta
>> >> >> > > ><alessandro@fb.com>wrote:
>> >> >> > > >
>> >> >> > > >> Hi all,
>> >> >> > > >>
>> >> >> > > >> Does anyone feel strongly for LongDoubleFloatDoubleVertex?
>> >> >> > > >> Reasons why I think it should be removed:
>> >> >> > > >>
>> >> >> > > >>   1.  Right now it's incorrect (returns target
vertex id as
>> >>edge
>> >> >> > value).
>> >> >> > > >>   2.  Iteration will always be inefficient,
since the
>> >>underlying
>> >> >> > Mahout
>> >> >> > > >> open-addressing hash map implementation doesn't
provide
>> >> >>iterators.
>> >> >> It
>> >> >> > > >> provides a way to copy the keys and values to
external
>> >> >>arrays/lists.
>> >> >> > > >>   3.  It's the only reason why we have Mahout
as a
>>dependency.
>> >> >> > > >>
>> >> >> > > >> I think we should strive to provide model implementations
>>that
>> >> >>are
>> >> >> > > >>generic
>> >> >> > > >> and/or extremely efficient. This one satisfies
neither.
>> >> >> > > >>
>> >> >> > > >> Thanks,
>> >> >> > > >>
>> >> >> > > >> Alessandro
>> >> >> > > >>
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >    Claudio Martella
>> >> >> >    claudio.martella@gmail.com
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >--
>> >> >
>> >> >  -jake
>> >>
>> >>
>> >
>> >
>> >--
>> >
>> >  -jake
>>
>>
>
>
>-- 
>   Claudio Martella
>   claudio.martella@gmail.com


Mime
View raw message