lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Indexing Strategy for 20 million documents
Date Fri, 08 Oct 2004 13:11:37 GMT
Jeff,

These questions are difficult to answer, because the answer depends on
a number of factors, such as:
- hardware (memory, disk speed, number of disks...)
- index complexity and size (number of fields and their size)
- number of queries/second
- complexity of queries
etc.

I would try putting everything in a single index first, and split it up
only if I see performance issues.  Going from 1 index to N indices is
not a lot of work (not a lot of Lucene-related code).  If searching 1
big index is too slow, split your index, put each index on a separate
disk, and use ParallelMultiSearcher
(http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ParallelMultiSearcher.html)
to search your indices.

Otis


--- Jeff Munson <jmunson@newspaperarchive.com> wrote:

> I am a new user of Lucene.  I am looking to index over 20 million
> documents (and a lot more someday) and am looking for ideas on the
> best
> indexing/search strategy.  
> 
> Which will optimize the Lucene search, one index or multiple indexes?
> Do I create multiple indexes and merge them all together?  Or do I
> create multiple indexes and search on the multiple indexes?  
> 
> Any helpful ideas would be appreciated!
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message