lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hubbard <charlie.hubb...@gmail.com>
Subject Re: Help running out of files
Date Fri, 06 Jan 2012 18:25:27 GMT
Thanks for the reply.  I'm still having trouble.  I've made some changes to
use commit over close, but I'm not seeing much in terms of changes on what
seems like ever increasing open file handles.  I'm developing on Mac OS X
10.6 and testing on Linux CentOS 4.5.  My biggest problem is I can't tell
why lsof is saying this process has this many open files.  I'm seeing
repeated files being opened more than once, and I'm seeing files showing up
in lsof output that don't exist on the file system.  For example here is
the lucene directory:

-rw-r--r-- 1 root root  328396   Jan   5 20:21      _ly.fdt
> -rw-r--r-- 1 root root    6284    Jan    5 20:21      _ly.fdx
> -rw-r--r-- 1 root root    2253    Jan    5 20:21      _ly.fnm
> -rw-r--r-- 1 root root  234489  Jan  5 20:21         _ly.frq
> -rw-r--r-- 1 root root   15704   Jan   5 20:21        _ly.nrm
> -rw-r--r-- 1 root root 1113954 Jan  5 20:21         _ly.prx
> -rw-r--r-- 1 root root    5421 Jan    5 20:21          _ly.tii
> -rw-r--r-- 1 root root  445988 Jan   5 20:21          _ly.tis
> -rw-r--r-- 1 root root  118262 Jan   6 09:56          _nx.cfs
> -rw-r--r-- 1 root root   10009 Jan   6 10:00           _ny.cfs
> -rw-r--r-- 1 root root      20 Jan     6 10:00           segments.gen
> -rw-r--r-- 1 root root     716 Jan    6 10:00           segments_kw


And here is an excerpt from: lsof -p 19422 | awk -- '{print $9}' | sort

...
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
/usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
...

As you can see none of those files actually exist.  Not only that, but they
are opened 8 or 9 times.  There are tons of these non-existant repeatedly
open files in the output.  So why are the handles being counted as being
open?

I have a single IndexWriter and a single IndexSearcher open on a single CFS
directory.  The writer is only used by a single thread, but IndexSearcher
can be shared among several threads.  I still think something has changed
in 3.1 that's causing this.  I hope you can help me understand how it's not.

Charlie

On Mon, Jan 2, 2012 at 3:03 PM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> hey charlie,
>
> there are a couple of wrong assumptions in your last email mostly
> related to merging. mergefactor = 10 doesn't mean that you are ending
> up with one file neither is it related to files. Yet, my first guess
> is that you are using CompoundFileSystem (CFS) so each segment
> corresponds to a single file. The merge factor relates to segments and
> is responsible for triggering segment merges by their  size (either in
> bytes or in documents). For more details see this blog:
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> If you are using CFS one segment is one file. In 3.1 CFS is only used
> if the target segment is less than the nonCFSRatio. That prevents the
> usage of CFS for segments that are bigger than a fraction of the
> existing index to be packed into CFS (by default 0.1 -> 10%)
>
> this means your index might create non-cfs segments with multiple
> files (10 in the worst case.... maybe I missed one but anyway...)
> which means the number of open files increases.
>
> This is only a guess since I don't know what you are doing with your
> index readers etc. Which platform are you one and what is the file
> descriptor limit? In general its ok to raise the FD limit on your OS
> and just let lucene do its job. if you are restricted in any way you
> can set the LogMergePolicy#setNoCFSRatio(double) to 1.0 and see you
> your are still seeing the problem.
>
> About commit vs. close - in general its not a good idea to close your
> IW at all. I'd keep it open as long as you can and commit if needed.
> Even optimize is somewhat overrated and should be used with care or
> not at all... (here is another writeup regarding optimize:
>
> http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you
> )
>
>
> hope that helps,
>
> simon
>
>
> On Mon, Jan 2, 2012 at 5:38 PM, Charlie Hubbard
> <charlie.hubbard@gmail.com> wrote:
> > I'm beginning to think there is an issue with 3.1 that's causing this.
> >  After looking over my code again I forgot that the mechanism that does
> the
> > indexing hasn't changed, and the index IS being closed between cycles.
> >  Even when using push vs pull.  This code used to work on 2.x lucene,
> but I
> > had to upgrade it.  It had been very stable under 2.x, but after
> upgrading
> > to 3.1 I've started seeing this problem.  I double checked the code doing
> > the indexing, and it hasn't changed since I upgraded to 3.1.  So the
> > constant in this equation is mostly my code.  What's different is 3.1.
> >  Furthermore, when new documents are pulled in through the
> > old mechanism the open file count continues to rise.  Over a 24 hours
> > period it's grown by +296 files, but only 10 or 12 documents indexed.
> >
> > So is this a known issue?  Should I upgrade to newer version to fix this?
> >
> > Thanks
> > Charlie
> >
> > On Sat, Dec 31, 2011 at 1:01 AM, Charlie Hubbard
> > <charlie.hubbard@gmail.com>wrote:
> >
> >> I have a program I recently converted from a pull scheme to a push
> scheme.
> >>  So previously I was pulling down the documents I was indexing, and
> when I
> >> was done I'd close the IndexWriter at the end of each iteration.  Now
> that
> >> I've converted to a push scheme I'm sent the documents to index, and I
> >> write them.  However, this means I'm not closing the IndexWriter since
> >> closing after every document would have poor performance.  Instead I'm
> >> keeping the IndexWriter open all the time.  Problem is after a while the
> >> number of open files continues to rise.  I've set the following
> parameters
> >> on the IndexWriter:
> >>
> >> merge.factor=10
> >> max.buffered.docs=1000
> >>
> >> After going over the api docs I thought this would mean it'd never
> create
> >> more than 10 files before merging those files into a single file, but
> it's
> >> creating 100's of files.  Since I'm not closing the IndexWriter will it
> >> merge the files?  From reading the API docs it sounded like merging
> happens
> >> regardless of flushing, commit, or close.  Is that true?  I've measured
> the
> >> files that are increasing, and it's files associated with this one index
> >> I'm leaving open.  I have another index that I do close periodically,
> and
> >> its not growing like this one.
> >>
> >> I've read some posts about using commit() instead of close() in
> situations
> >> like this because its faster performance.  However, commit() just
> flushes
> >> to disk rather than flushing and optimizing like close().  Not sure
> >> commit() is what I need or not.  Any suggestions?
> >>
> >> Thanks
> >> Charlie
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message