lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael A. Schoen" <>
Subject clustering results
Date Sat, 10 Apr 2004 00:16:25 GMT
I have an index of urls, and need to display the top 10 results for a given query, but want
to display only 1 result per domain. It seems that using either Hits or a HitCollector, I'll
need to access the doc, grab the domain field (I'll have it parse ahead of time) and only
take/display documents that are unique.

A significant percentage of the time I expect I may have to access thousands of results before
I find 10 in unique domains. Is there a faster approach that won't require accessing thousands
of documents?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message