mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <ted.dunn...@gmail.com>
Subject Re: PriorityQueue in NearestNUserNeighborhood
Date Fri, 07 Nov 2008 14:05:31 GMT
Generally in this sort of structure, you are keeping the top N items.
Consider top == biggest.

The required operations are

1) compare to smallest (usually this is done by caching the smallest saved
score to avoid touching the queue structure)

2) insertion of new item larger than smallest (because step (1) avoids the
need to insert items that are not at least as big as the smallest)

3) delete smallest item if size is greater than limit.

4) traverse the elements of the queue in sorted order for final disposition
(either largest first or smallest first)

I usually use a heap for this, but a priority queue works just as well as
long as highest priority == lowest score.

Conveniently, the Java implementation of PriorityQueue considers smallest to
be highest priority.

Why do you say that you need access to the other end of the queue as well?

On Fri, Nov 7, 2008 at 2:13 AM, Sean Owen <srowen@gmail.com> wrote:

> The only catch there as I remember is the need to know the smallest
> element -- but also the need to limit the size and therefore know the
> largest element too. So I think that's why I couldn't use the data
> structure. I could be missing something.
>
> On Thu, Nov 6, 2008 at 10:47 PM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com> wrote:
> > Hi,
> >
> > I see this in the NearestNUserNeighborhood:
> >
> >          ListIterator<UserCorrelationPair> iterator =
> queue.listIterator(queue.size());
> >          while (iterator.hasPrevious()) {
> >            if (theCorrelation <= iterator.previous().theCorrelation) {
> >              iterator.next();
> >              break;
> >            }
> >          }
> >
> > This looks like a priority queue.... if so, I wonder if it would be
> clearer to just use the java.util.PriorityQueue or maybe even a variation of
> the one in Lucene?  Not sure if there's a need for this change, just
> thinking our loud as I browse Taste code.
> >
> > Oh, I see Hadoop has its own PQ, too:
> >
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/util/PriorityQueue.html
> > which seems to be a slightly modified variation of the Lucene version:
> >
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/util/PriorityQueue.html
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
>



-- 
ted

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message