lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: how to use DuplicateFilter to get unique documents based on a fieldName
Date Thu, 04 Mar 2010 15:07:23 GMT
If the field you want to use for deduping is ISBN, create a
DuplicateFilter using whatever your ISBN field name is as the field
name and pass that to one of the search methods that takes a filter.

If your index is large I'd be worried about performance and would look
at deduping at indexing time i.e. have one lucene document per ISBN.


On Thu, Mar 4, 2010 at 12:43 PM, anisha@ekkitab <> wrote:
> Hi there, Could someone help me with the usage of DuplicateFilters. Here is
> my problem
> I have created a search index on book Id , title ,and author from a database
> of books which fall under various categories. Some books fall under more
> than one category. Now, when i issue a search, I get back 'X' books matching
> the search criteria, some of which are repeated, because that books are in
> different documents and its the expected behaviour.
> I use the  TopFieldDocCollector . getTotalHits() to get the total count. But
> this includes the repeats as mentioned above. This count is not the actual
> count, Hence when I issue a search on title or author i want to get a unique
> count / list of books. How do I use DuplicateFilter to acheive this.
> Please help
> Regards
> Anish
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message