mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Limon <julian.li...@tukipa.com>
Subject Re: Top items in a vector
Date Mon, 25 Apr 2011 22:01:16 GMT
Thanks a lot, everyone!

This is really helpful. I only expect to get the top k items out of an
n-sized vector. The heap should be small enough to fit in memory and (for my
case) it might be an overkill to try to parallelize it. I will implement
this using the algorithms that were suggested here. I also think it would be
great if there was a more general class that could be instantiated with a
vector, but for my specific case it's probably not needed.

Again, thanks everyone. I'm surprised of the level of engagement of this
community. This is definitely a huge plus to our decision to go with Mahout.

Thanks,

Julian

2011/4/25 Ted Dunning <ted.dunning@gmail.com>

> The simplest implementation is O(n log(k)) (inserts every element, then
> drops at most one element).
>
> Caching the k-th best element and only inserting new items has the same
> worst case behavior in ordered cases.
>
> There is a true O(n + k) algorithm based on quicksort (I think).
>
> On Mon, Apr 25, 2011 at 2:26 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
>
> > On Mon, Apr 25, 2011 at 2:23 PM, Jake Mannix <jake.mannix@gmail.com>
> > wrote:
> > >
> > >
> > > This is O(n + k*log(k)) for a randomly sorted list.
> > >
> >
> > This is a bit sloppy: there's a multiplicative factor of log(n) also in
> the
> > second
> > additive factor, but that's still linear overall.
> >
> >  -jake
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message