lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Help running out of files
Date Fri, 06 Jan 2012 20:06:57 GMT
Something that did change at some point, can't remember when, was the
way that discarded but not explicitly closed searchers/readers are
handled.  I think that they used to get garbage collected, causing
open files to be closed, but now need to be explicitly closed.  Sounds
to me like you are opening new searchers/readers without closing old
ones.


--
Ian.


On Fri, Jan 6, 2012 at 6:50 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> Can you show the code? In particular are you re-opening
> the index writer?
>
> Bottom line: This isn't a problem anyone expects
> in 3.1 absent some programming error on your
> part, so it's hard to know what to say without
> more information.
>
> 3.1 has other problems if you use spellcheck.collate,
> you might want to upgrade if you use that feature
> to at least 3.3. But I truly believe that this is irrelevant
> to your problem.
>
> Best
> Erick
>
>
> On Fri, Jan 6, 2012 at 1:25 PM, Charlie Hubbard
> <charlie.hubbard@gmail.com> wrote:
>> Thanks for the reply.  I'm still having trouble.  I've made some changes to
>> use commit over close, but I'm not seeing much in terms of changes on what
>> seems like ever increasing open file handles.  I'm developing on Mac OS X
>> 10.6 and testing on Linux CentOS 4.5.  My biggest problem is I can't tell
>> why lsof is saying this process has this many open files.  I'm seeing
>> repeated files being opened more than once, and I'm seeing files showing up
>> in lsof output that don't exist on the file system.  For example here is
>> the lucene directory:
>>
>> -rw-r--r-- 1 root root  328396   Jan   5 20:21      _ly.fdt
>>> -rw-r--r-- 1 root root    6284    Jan    5 20:21      _ly.fdx
>>> -rw-r--r-- 1 root root    2253    Jan    5 20:21      _ly.fnm
>>> -rw-r--r-- 1 root root  234489  Jan  5 20:21         _ly.frq
>>> -rw-r--r-- 1 root root   15704   Jan   5 20:21        _ly.nrm
>>> -rw-r--r-- 1 root root 1113954 Jan  5 20:21         _ly.prx
>>> -rw-r--r-- 1 root root    5421 Jan    5 20:21          _ly.tii
>>> -rw-r--r-- 1 root root  445988 Jan   5 20:21          _ly.tis
>>> -rw-r--r-- 1 root root  118262 Jan   6 09:56          _nx.cfs
>>> -rw-r--r-- 1 root root   10009 Jan   6 10:00           _ny.cfs
>>> -rw-r--r-- 1 root root      20 Jan     6 10:00           segments.gen
>>> -rw-r--r-- 1 root root     716 Jan    6 10:00           segments_kw
>>
>>
>> And here is an excerpt from: lsof -p 19422 | awk -- '{print $9}' | sort
>>
>> ...
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
>> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
>> ...
>>
>> As you can see none of those files actually exist.  Not only that, but they
>> are opened 8 or 9 times.  There are tons of these non-existant repeatedly
>> open files in the output.  So why are the handles being counted as being
>> open?
>>
>> I have a single IndexWriter and a single IndexSearcher open on a single CFS
>> directory.  The writer is only used by a single thread, but IndexSearcher
>> can be shared among several threads.  I still think something has changed
>> in 3.1 that's causing this.  I hope you can help me understand how it's not.
>>
>> Charlie
>>
>> On Mon, Jan 2, 2012 at 3:03 PM, Simon Willnauer <
>> simon.willnauer@googlemail.com> wrote:
>>
>>> hey charlie,
>>>
>>> there are a couple of wrong assumptions in your last email mostly
>>> related to merging. mergefactor = 10 doesn't mean that you are ending
>>> up with one file neither is it related to files. Yet, my first guess
>>> is that you are using CompoundFileSystem (CFS) so each segment
>>> corresponds to a single file. The merge factor relates to segments and
>>> is responsible for triggering segment merges by their  size (either in
>>> bytes or in documents). For more details see this blog:
>>>
>>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>>>
>>> If you are using CFS one segment is one file. In 3.1 CFS is only used
>>> if the target segment is less than the nonCFSRatio. That prevents the
>>> usage of CFS for segments that are bigger than a fraction of the
>>> existing index to be packed into CFS (by default 0.1 -> 10%)
>>>
>>> this means your index might create non-cfs segments with multiple
>>> files (10 in the worst case.... maybe I missed one but anyway...)
>>> which means the number of open files increases.
>>>
>>> This is only a guess since I don't know what you are doing with your
>>> index readers etc. Which platform are you one and what is the file
>>> descriptor limit? In general its ok to raise the FD limit on your OS
>>> and just let lucene do its job. if you are restricted in any way you
>>> can set the LogMergePolicy#setNoCFSRatio(double) to 1.0 and see you
>>> your are still seeing the problem.
>>>
>>> About commit vs. close - in general its not a good idea to close your
>>> IW at all. I'd keep it open as long as you can and commit if needed.
>>> Even optimize is somewhat overrated and should be used with care or
>>> not at all... (here is another writeup regarding optimize:
>>>
>>> http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you
>>> )
>>>
>>>
>>> hope that helps,
>>>
>>> simon
>>>
>>>
>>> On Mon, Jan 2, 2012 at 5:38 PM, Charlie Hubbard
>>> <charlie.hubbard@gmail.com> wrote:
>>> > I'm beginning to think there is an issue with 3.1 that's causing this.
>>> >  After looking over my code again I forgot that the mechanism that does
>>> the
>>> > indexing hasn't changed, and the index IS being closed between cycles.
>>> >  Even when using push vs pull.  This code used to work on 2.x lucene,
>>> but I
>>> > had to upgrade it.  It had been very stable under 2.x, but after
>>> upgrading
>>> > to 3.1 I've started seeing this problem.  I double checked the code doing
>>> > the indexing, and it hasn't changed since I upgraded to 3.1.  So the
>>> > constant in this equation is mostly my code.  What's different is 3.1.
>>> >  Furthermore, when new documents are pulled in through the
>>> > old mechanism the open file count continues to rise.  Over a 24 hours
>>> > period it's grown by +296 files, but only 10 or 12 documents indexed.
>>> >
>>> > So is this a known issue?  Should I upgrade to newer version to fix this?
>>> >
>>> > Thanks
>>> > Charlie
>>> >
>>> > On Sat, Dec 31, 2011 at 1:01 AM, Charlie Hubbard
>>> > <charlie.hubbard@gmail.com>wrote:
>>> >
>>> >> I have a program I recently converted from a pull scheme to a push
>>> scheme.
>>> >>  So previously I was pulling down the documents I was indexing, and
>>> when I
>>> >> was done I'd close the IndexWriter at the end of each iteration.  Now
>>> that
>>> >> I've converted to a push scheme I'm sent the documents to index, and
I
>>> >> write them.  However, this means I'm not closing the IndexWriter since
>>> >> closing after every document would have poor performance.  Instead
I'm
>>> >> keeping the IndexWriter open all the time.  Problem is after a while
the
>>> >> number of open files continues to rise.  I've set the following
>>> parameters
>>> >> on the IndexWriter:
>>> >>
>>> >> merge.factor=10
>>> >> max.buffered.docs=1000
>>> >>
>>> >> After going over the api docs I thought this would mean it'd never
>>> create
>>> >> more than 10 files before merging those files into a single file, but
>>> it's
>>> >> creating 100's of files.  Since I'm not closing the IndexWriter will
it
>>> >> merge the files?  From reading the API docs it sounded like merging
>>> happens
>>> >> regardless of flushing, commit, or close.  Is that true?  I've measured
>>> the
>>> >> files that are increasing, and it's files associated with this one index
>>> >> I'm leaving open.  I have another index that I do close periodically,
>>> and
>>> >> its not growing like this one.
>>> >>
>>> >> I've read some posts about using commit() instead of close() in
>>> situations
>>> >> like this because its faster performance.  However, commit() just
>>> flushes
>>> >> to disk rather than flushing and optimizing like close().  Not sure
>>> >> commit() is what I need or not.  Any suggestions?
>>> >>
>>> >> Thanks
>>> >> Charlie
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message