lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: waaaay too many files in the index!
Date Wed, 04 Feb 2009 10:12:46 GMT

These files are normal Lucene segment files (in compound file  
format).  What's odd is that Lucene is not merging them down to a  
smaller set of segments.

Have you done any advanced things, like customize the deletion or  
merge policy?

When you close you writer, are you using just close() or close(false)?

If you can set the InfoStream on the writer for one of your  
incremental update sessions and post the results, maybe that will shed  
some light one what's going on.

Are you sure there are no exceptions being logged somewhere?  Lucene  
runs merges with background threads (by default), and if those threads  
hit unhandled exceptions it's possible they are logged somewhere you  
wouldn't normally look?

Mike

John Byrne wrote:

> MergeFactor and MergeDocs are left at default values. The indexing  
> is incremental, i.e. whenever someone adds or modifys a file to in  
> svn repository, the lucene index is updated, and the writer/reader/ 
> searcher are refreshed (closed and opened again).,
>
> According to the svn logs for the time the files were created, a few  
> hundred files were added that day.
>
> Overall, the index would have started out with around 150,000 to  
> 200,000 documents, with anything from 100 to 1000 being added per day.
>
> I don't optimize the index at any point, but I've never seen it get  
> like this before.
>
> Thanks,
> John
>
> Erick Erickson wrote:
>> What are your IndexWriter MergFactor and MergeDocs set to? Also, are
>> the dates on all these files indicative of all being create during  
>> the same
>> indexing run?
>>
>> Finally, how many documents are you indexing?
>>
>> Best
>> Erick
>>
>> On Tue, Feb 3, 2009 at 10:26 AM, John Byrne  
>> <john.byrne@propylon.com> wrote:
>>
>>
>>> Hi,
>>>
>>> I've got a weird problem with a lucene index, using 2.3.1. The index
>>> contains 6660 files. I don't know how this happened.Maybe somone  
>>> can tell me
>>> something about the files themselves? (examples below)
>>>
>>> On one day, between 10 and 40 of these files were being created  
>>> every
>>> minute. The index updates are triggered by updates to an SVN  
>>> repository, but
>>> I can't find any corresponding activity in the SVN logs.
>>>
>>> The lucene files all have names like this:
>>>
>>> _1qsw.cfs
>>> _1qsx.cfs
>>> _1qsy.cfs
>>> _1qsz.cfs
>>> _1qt0.cfs
>>>
>>> and are mostly < 5K in size.
>>>
>>> My application uses just one instance each of
>>> IndexReader/IndexWriter/IndexSearcher. From looking at
>>>
>>> Can anyone shed any light on these files? I'm not too hopeful  
>>> about fixing
>>> this index because we are getting "too many open files", even with  
>>> an
>>> unlimited ulimit, but any info/suggestions would be great. Thanks.
>>>
>>> -John
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>   
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.233 / Virus  
>> Database: 270.10.17/1933 - Release Date: 2/3/2009 5:48 PM
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message