lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Reuschling <christian.reuschl...@gmail.com>
Subject ParallelMultiSearcher and idf
Date Tue, 04 Aug 2009 09:50:16 GMT
Hello,

when searching over multiple indices, we create one IndexReader for each index,
and wrap them into a MultiReader, that we use for IndexSearcher creation.

This is fine for searching multiple indices on one machine, but in the case the
indices are distributed over the (intra)net, this scenario has several lacks:

- searching/scoring/sorting is 100% on the client machine, so you need all the
  ram and cpu power at every client.
- all the data necessary for scoring must go over the net - so the traffic
  should be significantly higher
- thus, there is a lack of overall performance

Nevertheless, creating a MultiReader and making a searcher out of it has one
advantage (at least can be an advantage depending on the scenario): The
document freqiencies of a term will be summed up, and thus it is 100%
transparent for scoring whether the indices are splittet or not.

I'm wondering whether there is the possibility to get the advantages of both
scenarios, e.g. by first summing up the query terms-related document
frequencies, and sending them together with the query to every (remote)searcher
of ParallelMultiSearcher, for scoring.

Maybe this is exactly what ParallelMultiSearcher does, and I haven't seen it?


Thanks for clarification!

Chris

Mime
View raw message