lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: waaaay too many files in the index!
Date Wed, 04 Feb 2009 12:20:02 GMT

OK thanks for bringing closure.

Mike

John Byrne wrote:

> No I'm not messing with the delete or merge policy - but  I think I  
> know what went wrong though...
>
> We have 2 instances of the application, for failover. They are never  
> supposed to be active at the same time, but I just discovered a  
> condition that can cause exactly that to happen. When we detect a  
> failure and switch to the second instance, it was assumed that the  
> first instance was dead - but it looks like it still had an open  
> writer on the index, leading to...concurrent write access.
>
> Logs confirmed that fail over happened minutes before these files  
> started appearing. Mystery solved!
>
> Thanks everyone.
>
> Michael McCandless wrote:
>>
>> These files are normal Lucene segment files (in compound file  
>> format).  What's odd is that Lucene is not merging them down to a  
>> smaller set of segments.
>>
>> Have you done any advanced things, like customize the deletion or  
>> merge policy?
>>
>> When you close you writer, are you using just close() or  
>> close(false)?
>>
>> If you can set the InfoStream on the writer for one of your  
>> incremental update sessions and post the results, maybe that will  
>> shed some light one what's going on.
>>
>> Are you sure there are no exceptions being logged somewhere?   
>> Lucene runs merges with background threads (by default), and if  
>> those threads hit unhandled exceptions it's possible they are  
>> logged somewhere you wouldn't normally look?
>>
>> Mike
>>
>> John Byrne wrote:
>>
>>> MergeFactor and MergeDocs are left at default values. The indexing  
>>> is incremental, i.e. whenever someone adds or modifys a file to in  
>>> svn repository, the lucene index is updated, and the writer/reader/ 
>>> searcher are refreshed (closed and opened again).,
>>>
>>> According to the svn logs for the time the files were created, a  
>>> few hundred files were added that day.
>>>
>>> Overall, the index would have started out with around 150,000 to  
>>> 200,000 documents, with anything from 100 to 1000 being added per  
>>> day.
>>>
>>> I don't optimize the index at any point, but I've never seen it  
>>> get like this before.
>>>
>>> Thanks,
>>> John
>>>
>>> Erick Erickson wrote:
>>>> What are your IndexWriter MergFactor and MergeDocs set to? Also,  
>>>> are
>>>> the dates on all these files indicative of all being create  
>>>> during the same
>>>> indexing run?
>>>>
>>>> Finally, how many documents are you indexing?
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Tue, Feb 3, 2009 at 10:26 AM, John Byrne <john.byrne@propylon.com 
>>>> > wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I've got a weird problem with a lucene index, using 2.3.1. The  
>>>>> index
>>>>> contains 6660 files. I don't know how this happened.Maybe somone  
>>>>> can tell me
>>>>> something about the files themselves? (examples below)
>>>>>
>>>>> On one day, between 10 and 40 of these files were being created  
>>>>> every
>>>>> minute. The index updates are triggered by updates to an SVN  
>>>>> repository, but
>>>>> I can't find any corresponding activity in the SVN logs.
>>>>>
>>>>> The lucene files all have names like this:
>>>>>
>>>>> _1qsw.cfs
>>>>> _1qsx.cfs
>>>>> _1qsy.cfs
>>>>> _1qsz.cfs
>>>>> _1qt0.cfs
>>>>>
>>>>> and are mostly < 5K in size.
>>>>>
>>>>> My application uses just one instance each of
>>>>> IndexReader/IndexWriter/IndexSearcher. From looking at
>>>>>
>>>>> Can anyone shed any light on these files? I'm not too hopeful  
>>>>> about fixing
>>>>> this index because we are getting "too many open files", even  
>>>>> with an
>>>>> unlimited ulimit, but any info/suggestions would be great. Thanks.
>>>>>
>>>>> -John
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - http://www.avg.com Version: 8.0.233 / Virus  
>>>> Database: 270.10.17/1933 - Release Date: 2/3/2009 5:48 PM
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.233 / Virus  
>> Database: 270.10.17/1933 - Release Date: 2/3/2009 5:48 PM
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message