lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Help running out of files
Date Mon, 02 Jan 2012 20:03:44 GMT
hey charlie,

there are a couple of wrong assumptions in your last email mostly
related to merging. mergefactor = 10 doesn't mean that you are ending
up with one file neither is it related to files. Yet, my first guess
is that you are using CompoundFileSystem (CFS) so each segment
corresponds to a single file. The merge factor relates to segments and
is responsible for triggering segment merges by their  size (either in
bytes or in documents). For more details see this blog:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

If you are using CFS one segment is one file. In 3.1 CFS is only used
if the target segment is less than the nonCFSRatio. That prevents the
usage of CFS for segments that are bigger than a fraction of the
existing index to be packed into CFS (by default 0.1 -> 10%)

this means your index might create non-cfs segments with multiple
files (10 in the worst case.... maybe I missed one but anyway...)
which means the number of open files increases.

This is only a guess since I don't know what you are doing with your
index readers etc. Which platform are you one and what is the file
descriptor limit? In general its ok to raise the FD limit on your OS
and just let lucene do its job. if you are restricted in any way you
can set the LogMergePolicy#setNoCFSRatio(double) to 1.0 and see you
your are still seeing the problem.

About commit vs. close - in general its not a good idea to close your
IW at all. I'd keep it open as long as you can and commit if needed.
Even optimize is somewhat overrated and should be used with care or
not at all... (here is another writeup regarding optimize:
http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you
)


hope that helps,

simon


On Mon, Jan 2, 2012 at 5:38 PM, Charlie Hubbard
<charlie.hubbard@gmail.com> wrote:
> I'm beginning to think there is an issue with 3.1 that's causing this.
>  After looking over my code again I forgot that the mechanism that does the
> indexing hasn't changed, and the index IS being closed between cycles.
>  Even when using push vs pull.  This code used to work on 2.x lucene, but I
> had to upgrade it.  It had been very stable under 2.x, but after upgrading
> to 3.1 I've started seeing this problem.  I double checked the code doing
> the indexing, and it hasn't changed since I upgraded to 3.1.  So the
> constant in this equation is mostly my code.  What's different is 3.1.
>  Furthermore, when new documents are pulled in through the
> old mechanism the open file count continues to rise.  Over a 24 hours
> period it's grown by +296 files, but only 10 or 12 documents indexed.
>
> So is this a known issue?  Should I upgrade to newer version to fix this?
>
> Thanks
> Charlie
>
> On Sat, Dec 31, 2011 at 1:01 AM, Charlie Hubbard
> <charlie.hubbard@gmail.com>wrote:
>
>> I have a program I recently converted from a pull scheme to a push scheme.
>>  So previously I was pulling down the documents I was indexing, and when I
>> was done I'd close the IndexWriter at the end of each iteration.  Now that
>> I've converted to a push scheme I'm sent the documents to index, and I
>> write them.  However, this means I'm not closing the IndexWriter since
>> closing after every document would have poor performance.  Instead I'm
>> keeping the IndexWriter open all the time.  Problem is after a while the
>> number of open files continues to rise.  I've set the following parameters
>> on the IndexWriter:
>>
>> merge.factor=10
>> max.buffered.docs=1000
>>
>> After going over the api docs I thought this would mean it'd never create
>> more than 10 files before merging those files into a single file, but it's
>> creating 100's of files.  Since I'm not closing the IndexWriter will it
>> merge the files?  From reading the API docs it sounded like merging happens
>> regardless of flushing, commit, or close.  Is that true?  I've measured the
>> files that are increasing, and it's files associated with this one index
>> I'm leaving open.  I have another index that I do close periodically, and
>> its not growing like this one.
>>
>> I've read some posts about using commit() instead of close() in situations
>> like this because its faster performance.  However, commit() just flushes
>> to disk rather than flushing and optimizing like close().  Not sure
>> commit() is what I need or not.  Any suggestions?
>>
>> Thanks
>> Charlie
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message