giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: LongDoubleFloatDoubleVertex
Date Fri, 01 Mar 2013 00:03:47 GMT
Is the mahout dependency causing problems?

It would be nice if we could actually implement some of the algorithms that
Mahout does via map-reduce in Giraph's BSP formalism, to show off how it
improves things.  Using the Mahout primitives can show that it's not about
the inner loop implementation, but the framework itself...


On Thu, Feb 28, 2013 at 1:55 PM, Eli Reisman <apache.mailbox@gmail.com>wrote:

> I like the idea of refactoring it into something more appropriate for us
> and ditching the Mahout dep. Good looking out.
>
>
> On Thu, Feb 28, 2013 at 10:15 AM, Claudio Martella <
> claudio.martella@gmail.com> wrote:
>
> > I agree, at this point we could have a RandomWalkVertex with edge values,
> > and a "null-edged" vertex for the PR benchmarks.
> > We make everybody happy and avoid code duplication.
> >
> >
> > On Thu, Feb 28, 2013 at 7:12 PM, Alessandro Presta <alessandro@fb.com
> > >wrote:
> >
> > > Hi Gianmarco,
> > >
> > > Yes, there will be more efficient implementations.
> > > In the redesign I'm working on (GIRAPH-528), there will be only one
> > Vertex
> > > class and edge storage is delegated to a VertexEdges class.
> > > So far I'm adding some generic implementations (ByteArrayEdges,
> > > ArrayListEdges, HashMapEdges) that work for all types, and some
> optimized
> > > ones (LongDoubleArrayEdges, LongNullArrayEdges).
> > >
> > > Do you specifically need edge values to be float while the other types
> > are
> > > double?
> > > It seems to me it would make sense to change RandomWalkVertex to use
> > > double edge values instead, and avoid code duplication (i.e. adding a
> > > LongFloatArrayEdges that's basically the same). We're not Trove after
> > all.
> > > Makes sense?
> > >
> > > Thanks for the feedback,
> > >
> > > Alessandro
> > >
> > >
> > > On 2/28/13 1:54 AM, "Gianmarco De Francisci Morales" <gdfm@apache.org>
> > > wrote:
> > >
> > > >Hi,
> > > >
> > > >Maybe the specific implementation can be thrown away, but personally I
> > > >feel
> > > >very strongly for the need of a good LongDoubleFloatDouble vertex.
> > > >It's the base for any serious random walk algorithm.
> > > >
> > > >I would call for a refactoring rather than a removal.
> > > >
> > > >Just my 2c.
> > > >
> > > >Cheers,
> > > >
> > > >--
> > > >Gianmarco
> > > >
> > > >
> > > >On Thu, Feb 28, 2013 at 7:54 AM, Alessandro Presta
> > > ><alessandro@fb.com>wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Does anyone feel strongly for LongDoubleFloatDoubleVertex?
> > > >> Reasons why I think it should be removed:
> > > >>
> > > >>   1.  Right now it's incorrect (returns target vertex id as edge
> > value).
> > > >>   2.  Iteration will always be inefficient, since the underlying
> > Mahout
> > > >> open-addressing hash map implementation doesn't provide iterators.
> It
> > > >> provides a way to copy the keys and values to external arrays/lists.
> > > >>   3.  It's the only reason why we have Mahout as a dependency.
> > > >>
> > > >> I think we should strive to provide model implementations that are
> > > >>generic
> > > >> and/or extremely efficient. This one satisfies neither.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Alessandro
> > > >>
> > >
> > >
> >
> >
> > --
> >    Claudio Martella
> >    claudio.martella@gmail.com
> >
>



-- 

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message