mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Top items in a vector
Date Mon, 25 Apr 2011 21:40:21 GMT
On Mon, Apr 25, 2011 at 2:33 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> There are two meanings of top floating around in this thread.
>
> I think that the OP was asking for "find the indexes of the largest
> elements
> of a vector".  That is definitely what Sean answered.
>

That's definitely what I was talking about as well.

  -jake


>
> I think that Dmitriy mean "find the most common elements of a collection".
>  That is definitely harder and needs slight cleverness.
>
> On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
>
> > if you "forget" some of the items you saw along with the counters,
> > how'd you add them back to the heap?
> >
> >
> > On Mon, Apr 25, 2011 at 1:08 PM, Jake Mannix <jake.mannix@gmail.com>
> > wrote:
> > > On Mon, Apr 25, 2011 at 1:03 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > wrote:
> > >
> > >> There's nothing in Mahout to do this (afaik).
> > >>
> > >> There's a statistical method for this doing that in linear time, which
> > >> is very easy to implement (and btw would make an excellent addition to
> > >> Mahout), google up 'countsketch'.
> > >>
> > >
> > > Taking top k items in a list of length n, when n >> k, is already O(n),
> > if
> > > you check the bottom of heap first before inserting.  Not sure if
> > > countsketch
> > > is necessary for the usual use case.
> > >
> > >  -jake
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message