lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Swanhart <greenl...@gmail.com>
Subject Re: Indexing Strategy for 20 million documents
Date Fri, 08 Oct 2004 16:53:21 GMT
It depends on a lot of factors.  I myself use multiple indexes for
about 10M documents.
My documents are transient.  Each day I get about 400K and I remove
about 400K.  I
always remove an entire days documents at one time.  It is much
faster/easier to delete
the lucene index for the day that I am removing, then looping through
one big index and
removing the entries with the IndexReader.  Since my data is also
partitioned by day in
my database, I essentially do the same thing there with "truncate table."

I use a ParallelMultiSearcher object to search the indexes.  I store
my indexes on a 14
disk 15k rpm  fibre channel RAID 1+0 array (striped mirrors).

I get very good performance in both updating and searching indexes.

On Fri, 8 Oct 2004 06:11:37 -0700 (PDT), Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Jeff,
> 
> These questions are difficult to answer, because the answer depends on
> a number of factors, such as:
> - hardware (memory, disk speed, number of disks...)
> - index complexity and size (number of fields and their size)
> - number of queries/second
> - complexity of queries
> etc.
> 
> I would try putting everything in a single index first, and split it up
> only if I see performance issues.  Going from 1 index to N indices is
> not a lot of work (not a lot of Lucene-related code).  If searching 1
> big index is too slow, split your index, put each index on a separate
> disk, and use ParallelMultiSearcher
> (http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ParallelMultiSearcher.html)
> to search your indices.
> 
> Otis
> 
> 
> 
> 
> --- Jeff Munson <jmunson@newspaperarchive.com> wrote:
> 
> > I am a new user of Lucene.  I am looking to index over 20 million
> > documents (and a lot more someday) and am looking for ideas on the
> > best
> > indexing/search strategy.
> >
> > Which will optimize the Lucene search, one index or multiple indexes?
> > Do I create multiple indexes and merge them all together?  Or do I
> > create multiple indexes and search on the multiple indexes?
> >
> > Any helpful ideas would be appreciated!
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message