Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 61516 invoked from network); 29 Mar 2007 13:39:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Mar 2007 13:39:54 -0000 Received: (qmail 95183 invoked by uid 500); 29 Mar 2007 13:39:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95106 invoked by uid 500); 29 Mar 2007 13:39:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95078 invoked by uid 99); 29 Mar 2007 13:39:43 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2007 06:39:43 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=HTML_00_10,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of peterlkeegan@gmail.com designates 64.233.162.228 as permitted sender) Received: from [64.233.162.228] (HELO nz-out-0506.google.com) (64.233.162.228) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2007 06:39:34 -0700 Received: by nz-out-0506.google.com with SMTP id i1so136013nzh for ; Thu, 29 Mar 2007 06:39:13 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; b=Wb2q5pvSDXc5B4A3/359XMJ66GyVczYcXOf/qmAqpwJKvAJ+25LxSGrBgnPAae92OSSBFCUZdnuU2AZ/OBiCsvUyxbPgtfF+KukGfxYkw0cPHnHDj9Dp7zaBDscEGGRheJrf9HwHeD88togfFB52CY+v6i2qyzbHx+Janyj1zSY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=IqqR9YOAJIp4Wqa6ED5sRkYziDtprD2QUEZRUda+UgjA7GZGTiNriZmpkRyM+nKsg34qotPkLRj6ioVLyDn12hPfm0Zah9+Pwn3AQnBL7kYY47pmEaB/jfD81k+1M6HB3Svgvrix9WyEa7gu3Ctuy0EyU99C0QwsGNObwGvSab0= Received: by 10.65.230.9 with SMTP id h9mr1466057qbr.1175175553507; Thu, 29 Mar 2007 06:39:13 -0700 (PDT) Received: by 10.65.150.17 with HTTP; Thu, 29 Mar 2007 06:39:13 -0700 (PDT) Message-ID: Date: Thu, 29 Mar 2007 09:39:13 -0400 From: "Peter Keegan" To: java-user@lucene.apache.org Subject: FieldSortedHitQueue enhancement MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_42072_8982145.1175175553155" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_42072_8982145.1175175553155 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline This is request for an enhancement to FieldSortedHitQueue/PriorityQueue that would prevent duplicate documents from being inserted, or alternatively, allow the application to prevent this (reason explained below). I can do this today by making the 'lessThan' method public and checking the queue before inserting like this: if (hq.size() < maxSize) { // doc will be inserted into queue - check for duplicate before inserting } else if (hq.size() > 0 && !hq.lessThan((ScoreDoc)fieldDoc, (ScoreDoc)hq.top()) { // doc will be inserted into queue - check for duplicate before inserting } else { // doc will not be inserted - no check needed } However, this is just replicating existing code in PriorityQueue->insert(). An alternative would be to have a method like: public boolean wouldBeInserted(ScoreDoc doc) // returns true if doc would be inserted, without inserting The reason for this is that I have some queries that get expanded into multiple searches and the resulting hits are OR'd together. The queries contain 'terms' that are not seen by Lucene but are handled by a HitCollector that uses external data for each document to evaluate hits. The results from the priority queue should contain no duplicate documents (first or last doc wins). Do any of these suggestions seem reasonable?. So far, I've been able to use Lucene without any modifications, and hope to continue this way. Peter ------=_Part_42072_8982145.1175175553155--