lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <Le...@seznam.cz>
Subject Re: large index scalability and java 1.1 compatibility question
Date Wed, 21 Jan 2004 01:05:16 GMT
Do you mean that you transfer all data through a pipe to RDBMS? What's 
the size of your index?

By the way, what problem was solved via the relation store? Concurrent 
optimize() processes without an exclusive lock?

Cheers,
Leo


Robert Engels wrote:

>I've gotten around this problem by using a relational store for the index,
>so I can incrementally update the index, and simultaneously 'optimize' in
>the background periodically.
>
>
>-----Original Message-----
>From: Leo Galambos [mailto:Leo.G@seznam.cz]
>Sent: Tuesday, January 20, 2004 3:58 PM
>To: Lucene Developers List
>Subject: Re: large index scalability and java 1.1 compatibility question
>
>
>I'm sorry if you receive this e-mail twice. My ISP has problems with
>SMTP relay.
>
>Mike Sawka wrote:
>
>  
>
>>We are currently running some multi-gigabyte indexes with over 10
>>million documents, and the "optimize" time is starting to become a
>>problem.  For our largest indexes we're already seeing times of 10-20
>>minutes, on a fairly decent machine, which is starting to hit the
>>threshold of acceptability for us (and will become unbearable as the
>>index grows 2-10 times larger).  So I've got two questions:
>>
>>  * Are there any tricks that you guys use to run large (incrementally
>>updatable) indexes?  I've already setup a mirroring system so I have one
>>index that is always searchable while the other one is incrementally
>>updating (and they swap periodically).
>>
>>
>>    
>>
>
>The optimize() routine is a bottleneck of Lucene. You have two options:
>a) not to call optimize(); b) modify your index significantly (>75% of
>items), and then call optimize(). Somebody may give you some advice, but
>there is theoretical barrier which cannot be undone.
>
>I was interested in this problem last year, and the method which was
>developed for another OSS search engine is presented here:
>http://www.egothor.org/temp/00-combi.png. The figure shows comparison
>between my method and a build-from-scratch approach of merge factor 100.
>Lucene (merge factor 100) seems to be slower than my method: about 40%
>in case of N=2^16, about 15-20% in case of N=2^46, thus add these values
>to the presented numbers, and you would see what Lucene does and when.
>
>Using the figure, you can analyze whether you would rather rebuild your
>index from scratch, or repair it using insert/removeDoc()/optimize(). If
>both ways failed, you should redesign your application.
>
>Hope this helps.
>
>Leo
>
>PS: The figure is based on a simulation of my algorithm. The results for
>N<2^26 were already verified in a real system. "number of documents" is
>log_2(total number of documents in the index) (2^16...2^46), "operations
>needed" summarizes I/O read and write operations and compares them to
>I/O during rebuild-from-scratch.
>
>N=total number of docs in the index
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message