lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Calabrese <>
Subject De-duping MultiSearcher results
Date Mon, 14 Nov 2005 22:53:42 GMT

In the project I'm working on we have a separate index for each database.  
There are 12 databases now. but in the future there may be as many as 20.  
They all have their own release cycle so I don't want to merge the indexes.

The databases all have some overlap between them.  We manage this by creating 
a unique GUID for each entity.  If an entity is in multiple db's it will have 
the same GUID in each db.

Currently I'm using the MultiSearcher to run a users query against each of the 
db's, then I use the brute force approach of looping through all the returned 
docs to removed dups using the guid field in the index.

This work fine when the results are under about 5,000 documents, but when 
there is a large number of results a search take way too long.

Does anyone know of a better and more efficient way to do this?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message