Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 76666 invoked from network); 10 Apr 2004 07:36:09 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 10 Apr 2004 07:36:09 -0000 Received: (qmail 26662 invoked by uid 500); 10 Apr 2004 07:35:37 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 26637 invoked by uid 500); 10 Apr 2004 07:35:36 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 26622 invoked from network); 10 Apr 2004 07:35:36 -0000 Received: from unknown (HELO c000.snv.cp.net) (209.228.32.64) by daedalus.apache.org with SMTP; 10 Apr 2004 07:35:36 -0000 Received: (cpmta 11995 invoked from network); 10 Apr 2004 00:35:48 -0700 Received: from 24.51.109.181 (HELO ?192.168.1.100?) by smtp.hatcher.net (209.228.32.64) with SMTP; 10 Apr 2004 00:35:48 -0700 X-Sent: 10 Apr 2004 07:35:48 GMT Mime-Version: 1.0 (Apple Message framework v613) In-Reply-To: <00b101c41e91$1532d2c0$514011ac@LOOKSMART10574> References: <00b101c41e91$1532d2c0$514011ac@LOOKSMART10574> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: clustering results Date: Sat, 10 Apr 2004 03:35:48 -0400 To: "Lucene Users List" X-Mailer: Apple Mail (2.613) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Apr 9, 2004, at 8:16 PM, Michael A. Schoen wrote: > I have an index of urls, and need to display the top 10 results for a > given query, but want to display only 1 result per domain. It seems > that using either Hits or a HitCollector, I'll need to access the doc, > grab the domain field (I'll have it parse ahead of time) and only > take/display documents that are unique. > > A significant percentage of the time I expect I may have to access > thousands of results before I find 10 in unique domains. Is there a > faster approach that won't require accessing thousands of documents? I have examples of this that I can post when I have more time, but a quick pointer... check out the overloaded IndexSearcher.search() methods which accept a Sort. You can do really really interesting slicing and dicing, I think, using it. Try this one on for size: example.displayHits(allBooks, new Sort(new SortField[]{ new SortField("category"), SortField.FIELD_SCORE, new SortField("pubmonth", SortField.INT, true) })); Be clever indexing the piece you want to group on - I think you may find this the solution you're looking for. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org