mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Profiling SequentialAccessSparseVector
Date Fri, 19 Feb 2010 14:52:44 GMT
On Fri, Feb 19, 2010 at 6:16 AM, Robin Anil <robin.anil@gmail.com> wrote:

> How much would the overhead of such an m/r be ? HashMap/Merge sort based
> grouping I assume
>

Done with a HashMap, this is too expensive (on a Vector, this is basically
an
"inner loop" operation - no HashMap for the same reason that Vectors aren't
based on HashMap anymore).  It needs to be done efficiently
*internal* to the Vector impl, so local data structures can be used
properly.

  -jake


>
> On Fri, Feb 19, 2010 at 5:28 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > Actually, this makes the case that we should have something like:
> >
> >     microMapReduce(aggregatorFunction, aggregatorUnit, binaryMapFunction,
> > vectorA, vectorB)
> >
> > The name should be changed after its rhetorical effect has worn off.  As
> > the
> > Chukwa guys tend to say, its turtles all the way down.  We can have
> > map-reduce inside map-reduce.
> >
> > On Thu, Feb 18, 2010 at 3:41 PM, Robin Anil <robin.anil@gmail.com>
> wrote:
> >
> > > TODO: sum of minus to be optimised without having to hold the
> > intermediate
> > > vector.
> > >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message