From otis_gospodnetic@yahoo.com Thu Sep 11 10:42:36 2003 Return-Path: Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 74831 invoked from network); 11 Sep 2003 10:42:36 -0000 Received: from unknown (HELO web12707.mail.yahoo.com) (216.136.173.244) by daedalus.apache.org with SMTP; 11 Sep 2003 10:42:36 -0000 Message-ID: <20030911104236.62348.qmail@web12707.mail.yahoo.com> Received: from [194.152.206.170] by web12707.mail.yahoo.com via HTTP; Thu, 11 Sep 2003 03:42:36 PDT Date: Thu, 11 Sep 2003 03:42:36 -0700 (PDT) From: Otis Gospodnetic Subject: Re: PATCH: PriorityQueue To: Lucene Developers List In-Reply-To: <3F575B7C.50301@detego-software.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Thank you, this all looks correct and elegant to me. Patches applied, about to be checked in. Please keep this stuff coming, if you notice any other problems. Otis --- Christoph Goller wrote: > Hi folks, > > This is a suggestion how to make code with PriorityQueue more > consistent > and easier to understand. I am not talking about a bug. The current > code > seems to work fine. > > Currently there are two different kinds of applications of > PriorityQueue. > In SegmentMergeQueue, PhraseQueue, and TermPositionsQueue one knows > in advance the maximum number of elements in the queue and the queue > is > only used to order the elements. One is never tempted to add more > elements > than the specified number to the queue. With HitQueue one wants to > find the > n best hits within an a priori unknown number of hits. > > My problem: > > 1) PriorityQueue currently is initialized with a maximum size, > however it > actually can hold twice the number of elements. This feature is only > used > by HitQueue which currently must be able to hold one element more > than > maxSize. This is required by the code in IndexSearcher and > MultiSearcher. > > 2) I find the code in IndexSearcher and MultiSearcher for controlling > the > size of the queue a little bit confusing. Furthermore, if the queue > is already > full and a new element has a higher score than top(), it is more > efficient > to change the top-element and call adjustTop() than to add the new > element > and call pop() afterwards (what is currently done). > > Suggestion: > > IŽd like to introduce a clearer contract about the maxSize in > PriorityQueue. > PriorityQueue should be able to hold exactly maxSize elements and > put() > should throw an exception if one accidently tries to add more > elements. > Furthermore IŽd like to add a method insert() to PriorityQueue that > only adds > a new element if either the queue is not full or !lessThan(element, > top()). > This new method can then be used in IndexSearcher and MultiSearcher. > > Patch files for PriorityQueue, IndexSearcher, MultiSearcher, and > TestPriorityQueue (one additional test) are attached. > > kind regards, > Christoph > > -- > ***************************************************************** > * Dr. Christoph Goller Tel.: +49 89 203 45734 * > * Detego Software GmbH Mobile: +49 179 1128469 * > * Keuslinstr. 13 Fax.: +49 721 151516176 * > * 80798 München, Germany Email: goller@detego-software.de * > ***************************************************************** > > Index: PriorityQueue.java > =================================================================== > RCS file: > /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/util/PriorityQueue.java,v > retrieving revision 1.3 > diff -u -r1.3 PriorityQueue.java > --- PriorityQueue.java 7 Nov 2002 05:55:40 -0000 1.3 > +++ PriorityQueue.java 4 Sep 2003 13:29:40 -0000 > @@ -60,6 +60,7 @@ > public abstract class PriorityQueue { > private Object[] heap; > private int size; > + private int maxSize; > > /** Determines the ordering of objects in this priority queue. > Subclasses > must define this one method. */ > @@ -68,16 +69,41 @@ > /** Subclass constructors must call this. */ > protected final void initialize(int maxSize) { > size = 0; > - int heapSize = (maxSize * 2) + 1; > + int heapSize = maxSize + 1; > heap = new Object[heapSize]; > + this.maxSize = maxSize; > } > > - /** Adds an Object to a PriorityQueue in log(size) time. */ > + /** > + * Adds an Object to a PriorityQueue in log(size) time. > + * If one tries to add more objects than maxSize from initialize > + * a RuntimeException (ArrayIndexOutOfBound) is thrown. > + */ > public final void put(Object element) { > size++; > heap[size] = element; > upHeap(); > } > + > + /** > + * Adds element to the PriorityQueue in log(size) time if either > + * the PriorityQueue is not full, or !lessThan(element, top()). > + * @param element > + * @return true if element is added, false otherwise. > + */ > + public boolean insert(Object element){ > + if(size < maxSize){ > + put(element); > + return true; > + } > + else if(size > 0 && !lessThan(element, top())){ > + heap[1] = element; > + adjustTop(); > + return true; > + } > + else > + return false; > + } > > /** Returns the least element of the PriorityQueue in constant > time. */ > public final Object top() { > > Index: IndexSearcher.java > =================================================================== > RCS file: > /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/IndexSearcher.java,v > retrieving revision 1.9 > diff -u -r1.9 IndexSearcher.java > --- IndexSearcher.java 24 May 2003 15:24:26 -0000 1.9 > +++ IndexSearcher.java 4 Sep 2003 12:41:44 -0000 > @@ -133,18 +133,11 @@ > final HitQueue hq = new HitQueue(nDocs); > final int[] totalHits = new int[1]; > scorer.score(new HitCollector() { > - private float minScore = 0.0f; > public final void collect(int doc, float score) { > if (score > 0.0f && // ignore zeroed buckets > (bits==null || bits.get(doc))) { // skip docs not in bits > totalHits[0]++; > - if (score >= minScore) { > - hq.put(new ScoreDoc(doc, score)); // update hit queue > - if (hq.size() > nDocs) { // if hit queue overfull > - hq.pop(); // remove lowest in hit queue > - minScore = ((ScoreDoc)hq.top()).score; // reset minScore > - } > - } > + hq.insert(new ScoreDoc(doc, score)); > } > } > }, reader.maxDoc()); > > Index: MultiSearcher.java > =================================================================== > RCS file: > /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/search/MultiSearcher.java,v > retrieving revision 1.10 > diff -u -r1.10 MultiSearcher.java > --- MultiSearcher.java 29 Jan 2003 17:18:54 -0000 1.10 > +++ MultiSearcher.java 4 Sep 2003 12:43:13 -0000 > @@ -144,7 +144,6 @@ > public TopDocs search(Query query, Filter filter, int nDocs) > throws IOException { > HitQueue hq = new HitQueue(nDocs); > - float minScore = 0.0f; > int totalHits = 0; > > for (int i = 0; i < searchables.length; i++) { // search each > searcher > @@ -153,15 +152,9 @@ > ScoreDoc[] scoreDocs = docs.scoreDocs; > for (int j = 0; j < scoreDocs.length; j++) { // merge > scoreDocs into hq > ScoreDoc scoreDoc = scoreDocs[j]; > - if (scoreDoc.score >= minScore) { > - scoreDoc.doc += starts[i]; // convert doc > - hq.put(scoreDoc); // update hit queue > - if (hq.size() > nDocs) { // if hit queue overfull > - hq.pop(); // remove lowest in hit queue > - minScore = ((ScoreDoc)hq.top()).score; // reset minScore > - } > - } else > - break; // no more scores > minScore > + scoreDoc.doc += starts[i]; // convert > doc > + if(!hq.insert(scoreDoc)) > + break; // > no more scores > minScore > } > } > > > Index: TestPriorityQueue.java > =================================================================== > RCS file: > /home/cvspublic/jakarta-lucene/src/test/org/apache/lucene/util/TestPriorityQueue.java,v > retrieving revision 1.3 > diff -u -r1.3 TestPriorityQueue.java > --- TestPriorityQueue.java 5 Jun 2002 01:50:54 -0000 1.3 > +++ TestPriorityQueue.java 4 Sep 2003 13:39:34 -0000 > @@ -135,4 +135,16 @@ > pq.clear(); > assertEquals(0, pq.size()); > } > + > + public void testFixedSize(){ > + PriorityQueue pq = new IntegerQueue(3); > + pq.insert(new Integer(2)); > + pq.insert(new Integer(3)); > + pq.insert(new Integer(1)); > + pq.insert(new Integer(5)); > + pq.insert(new Integer(7)); > + pq.insert(new Integer(1)); > + assertEquals(3, pq.size()); > + assertEquals(3, ((Integer) pq.top()).intValue()); > + } > } > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com