lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venu Durgam <vdur...@yahoo.com>
Subject Re: clustering results
Date Sat, 10 Apr 2004 13:47:28 GMT
Erik,

Thanks for the poiner.
I am not sure how sort can filter out results.
sort will just sort the results right ?

lets say if i had below results 
http://www.b.com/1.html

http://www.a.com/1.html
http://www.b.com/2.html
http://www.a.com/2.html

if you sort by domain name, results might be
http://www.a.com/1.htmlhttp://www.a.com/2.html
http://www.b.com/1.html
http://www.b.com/2.html
 
If i want to have one result per domain. no sorting, just filtering out some results.
http://www.b.com/1.html

http://www.a.com/1.html

can this still be achieved using sort? If not any other ways of doing this ?

Thanks.

Erik Hatcher <erik@ehatchersolutions.com> wrote:
On Apr 9, 2004, at 8:16 PM, Michael A. Schoen wrote:
> I have an index of urls, and need to display the top 10 results for a 
> given query, but want to display only 1 result per domain. It seems 
> that using either Hits or a HitCollector, I'll need to access the doc, 
> grab the domain field (I'll have it parse ahead of time) and only 
> take/display documents that are unique.
>
> A significant percentage of the time I expect I may have to access 
> thousands of results before I find 10 in unique domains. Is there a 
> faster approach that won't require accessing thousands of documents?

I have examples of this that I can post when I have more time, but a 
quick pointer... check out the overloaded IndexSearcher.search() 
methods which accept a Sort. You can do really really interesting 
slicing and dicing, I think, using it. Try this one on for size:

example.displayHits(allBooks,
new Sort(new SortField[]{
new SortField("category"),
SortField.FIELD_SCORE,
new SortField("pubmonth", SortField.INT, true)
}));

Be clever indexing the piece you want to group on - I think you may 
find this the solution you're looking for.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message