lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Removing similar documents from search results
Date Mon, 14 Mar 2005 17:55:41 GMT

Hi Miles :)

I can imagine if you apply clustering to search results anyway then the 
information about clusters can help you determine 'similar' results and 
reorder the output list.

Just a thought.


Miles Barr wrote:
> Has anyone tried to remove similar documents from their search results?
> It looks like Google does some on the fly filtering of the results,
> hiding pages which is thinks are too similar, i.e. when you see:
> "In order to show you the most relevant results, we have omitted some
> entries very similar to the 7 already displayed.
> If you like, you can repeat the search with the omitted results
> included."
> at the bottom of the page.
> Is there anything in Lucene or one of the contrib packages that compares
> two documents?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message