lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Does Lucene Supports Billions of data
Date Thu, 01 May 2008 22:13:04 GMT
From: John Wang <john.wang@gmail.com>
[...]
> sub index 1: 1 billion docs
> sub index 2: 1 billion docs
> sub index 3: 1 billion docs
> 
> federating search to these subindexes, you represent an index of 3 
> billiondocs, and all internal doc ids are of type int.

That falls under Daniel's "...unless you wrap your own framework around it". The problem with
the solution you're describing is that it's not functionally equivalent to a single index
of 3 billion docs.

If you just create 3 independent indexes and merge the top hits from all 3, the ranking of
the documents will be messed up. You'll need to make sure that the scores from the different
indexes can be compared. That's tricky when the score depends on the frequency of the terms
in the whole corpus.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message