lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sendtoprat@yahoo.co.in" <sendtop...@yahoo.co.in>
Subject Re: InderxWriter.optimize() fail
Date Tue, 10 Feb 2009 23:14:47 GMT

We are using lucene 2.4.


Michael McCandless-2 wrote:
> 
> 
> Which version of Lucene are you using?
> 
> More questions/answers below...
> 
> sendtoprat@yahoo.co.in wrote:
> 
>> We scan web and index pages in lucene. Our index size is in the  
>> range of
>> 500K to 1 million documens.  As we index pages, we also call
>> IndexWriter.optimize after certain time intervals [I believe Lucene  
>> also
>> does optimization in the background ?].
> 
> Actually Lucene merges segments periodically in the background, but does
> not optimize.
> 
>> So far it has worked great. But for
>> just this one scan we noticed that the our index size grew to 90 GB  
>> for
>> about 900K documents [typical index size should be around 17-18GB].  
>> We are
>> not sure what caused the index to grow this large. Outside of our  
>> system,
>> when we did a forced IndexWriter.optimize() on this 90 GB lucene  
>> index, it
>> indeed shrinked to 17 GB. My question is what may have caused the  
>> size to
>> grow to 90GB?
> 
> Optimize requires free temporary disk space equal to 1X the index size.
> 
> Do you have an IndexReader open on the index when optimize runs?  That
> ties up another 1X.
> 
> That should mean a 17-18GB index takes 51-54 GB, so I'm not sure why
> you got up to 90 GB.  There we no exceptions, even in BG merge threads?
> 
> Are you reopening readers while optimize is running?  In theory that  
> could
> tie up even more disk space (eg if you didn't close the old readers).
> 
>> Did the size grow because optimization failed ?
> 
> If optimization fails it would remove the partially written files, so  
> I don't think
> this would explain too-high disk usage.
> 
>> Does
>> optimization fail if there is any foreign file in the lucene index  
>> directory
>> [though we tried optimizing with foreign files in lucene directory,  
>> and
>> lucene still did optimize the index.]
> 
> Foreign files are harmless as long as they don't conflict w/ Lucene's
> file names.
> 
> Mike
> 
> 

-- 
View this message in context: http://www.nabble.com/InderxWriter.optimize%28%29-fail-tp21937277p21944987.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message