giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Presta <alessan...@fb.com>
Subject Re: LongDoubleFloatDoubleVertex
Date Fri, 01 Mar 2013 00:18:59 GMT
It's not like it causes problems, it's just that it's a pretty big
dependency to justify for a small example.

As for the motivation, if your point is to prove the framework's
superiority in some context, then you can use the simplest possible
implementation (ArrayList).
The Giraph framework is all about iterating over edges, so an
implementation that doesn't support that with reasonable efficiency
doesn't make a lot of sense to me.

It also follows that hash map-based implementations only make sense for
algorithms that make use of mutations.

That said, something like a Trove hash map would probably be more
appropriate (more efficient than the standard Java HashMap, at the expense
of generality). That could be a good candidate for a
LongDoubleHashMapEdges implementation.
I can give that a shot if it sounds good.

On 2/28/13 4:03 PM, "Jake Mannix" <jake.mannix@gmail.com> wrote:

>Is the mahout dependency causing problems?
>
>It would be nice if we could actually implement some of the algorithms
>that
>Mahout does via map-reduce in Giraph's BSP formalism, to show off how it
>improves things.  Using the Mahout primitives can show that it's not about
>the inner loop implementation, but the framework itself...
>
>
>On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman
><apache.mailbox@gmail.com>wrote:
>
>> I like the idea of refactoring it into something more appropriate for us
>> and ditching the Mahout dep. Good looking out.
>>
>>
>> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella <
>> claudio.martella@gmail.com> wrote:
>>
>> > I agree, at this point we could have a RandomWalkVertex with edge
>>values,
>> > and a "null-edged" vertex for the PR benchmarks.
>> > We make everybody happy and avoid code duplication.
>> >
>> >
>> > On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta <alessandro@fb.com
>> > >wrote:
>> >
>> > > Hi Gianmarco,
>> > >
>> > > Yes, there will be more efficient implementations.
>> > > In the redesign I'm working on (GIRAPH-528), there will be only one
>> > Vertex
>> > > class and edge storage is delegated to a VertexEdges class.
>> > > So far I'm adding some generic implementations (ByteArrayEdges,
>> > > ArrayListEdges, HashMapEdges) that work for all types, and some
>> optimized
>> > > ones (LongDoubleArrayEdges, LongNullArrayEdges).
>> > >
>> > > Do you specifically need edge values to be float while the other
>>types
>> > are
>> > > double?
>> > > It seems to me it would make sense to change RandomWalkVertex to use
>> > > double edge values instead, and avoid code duplication (i.e. adding
>>a
>> > > LongFloatArrayEdges that's basically the same). We're not Trove
>>after
>> > all.
>> > > Makes sense?
>> > >
>> > > Thanks for the feedback,
>> > >
>> > > Alessandro
>> > >
>> > >
>> > > On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales"
>><gdfm@apache.org>
>> > > wrote:
>> > >
>> > > >Hi,
>> > > >
>> > > >Maybe the specific implementation can be thrown away, but
>>personally I
>> > > >feel
>> > > >very strongly for the need of a good LongDoubleFloatDouble vertex.
>> > > >It's the base for any serious random walk algorithm.
>> > > >
>> > > >I would call for a refactoring rather than a removal.
>> > > >
>> > > >Just my 2c.
>> > > >
>> > > >Cheers,
>> > > >
>> > > >--
>> > > >Gianmarco
>> > > >
>> > > >
>> > > >On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta
>> > > ><alessandro@fb.com>wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> Does anyone feel strongly for LongDoubleFloatDoubleVertex?
>> > > >> Reasons why I think it should be removed:
>> > > >>
>> > > >>   1.  Right now it's incorrect (returns target vertex id as edge
>> > value).
>> > > >>   2.  Iteration will always be inefficient, since the underlying
>> > Mahout
>> > > >> open-addressing hash map implementation doesn't provide
>>iterators.
>> It
>> > > >> provides a way to copy the keys and values to external
>>arrays/lists.
>> > > >>   3.  It's the only reason why we have Mahout as a dependency.
>> > > >>
>> > > >> I think we should strive to provide model implementations that
>>are
>> > > >>generic
>> > > >> and/or extremely efficient. This one satisfies neither.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Alessandro
>> > > >>
>> > >
>> > >
>> >
>> >
>> > --
>> >    Claudio Martella
>> >    claudio.martella@gmail.com
>> >
>>
>
>
>
>-- 
>
>  -jake


Mime
View raw message