lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hubbard <charlie.hubb...@gmail.com>
Subject Re: Help running out of files
Date Sat, 07 Jan 2012 23:56:19 GMT
Ok I think I've fixed my original problem by converting everything to use
commit() and never call close() except when the server shuts down.  This
means I'm not closing my IndexWriter or IndexSearcher after opening them.
 I periodically call commit() on the IndexWriter after indexing my
documents.  However, my new issue is that the changes made to the index
isn't reflected except the first time commit() is called.  The 2nd, 3rd,
etc calls to commit() never show up when doing searches.  I reread the API
docs, and it seems like this should work.

My client code is the following:

   SearchIndex indexer = ...;
   if( incomingFiles != null ) {
      for( File incoming : incomingFiles ) {
         indexer.process( incoming);
      }
      indexer.commit();
  }

Here is the excerpt from my SearchIndex class:

public class SearchIndex {
    ...
    protected IndexSearcher getSearcher() throws IOException {
        synchronized( searchLock ) {
            if( searcher == null  ) {
                List<IndexReader> readers = new ArrayList<IndexReader>();
                for( MailDrop drop : store.getMailDrops() ) {
                    File index = drop.getIndex(indexName);
                    if( index.exists() ) {

readers.add(IndexReader.open(FSDirectory.open(index), true));
                    }
                }
                if( logger.isDebugEnabled() ) logger.debug("Opening
searcher: " + indexName );
                searcher = new IndexSearcher(new MultiReader(
readers.toArray( new IndexReader[readers.size()]), true ));
            }
            return searcher;
        }
    }

    public IndexWriter getWriter() throws IOException, InterruptedException
{
        synchronized( writerLock ) {
            if( writer == null ) {
                if( reader != null ) { // is someone currently deleting?
 Then wait();
                    writerLock.wait();
                }
                MailDrop mailDrop = store.getActiveMailDrop();
                if( mailDrop == null ) return null;
                writer = createWriter( mailDrop.getIndex(indexName) );
            }
            return writer;
        }
    }

    private IndexWriter createWriter(File index) throws IOException {
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_31);

        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_31,
analyzer);
        if( mergeFactor != null ) {
            ((LogMergePolicy)config.getMergePolicy()).setMergeFactor(
mergeFactor );
        }

        if( logger.isDebugEnabled() ) {
            logger.debug("Opening writer for " + index.getAbsolutePath() );
        }
        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(index),
config );
        return indexWriter;
    }

    public IndexReader getReader() throws IOException, InterruptedException
{
        synchronized( writerLock ) {
            if( reader == null ) {
                if( writer != null ) { // is someone currently writing?
 Then wait();
                    writerLock.wait();
                }
                List<IndexReader> readers = new ArrayList<IndexReader>(
store.getMailDrops().size() );
                for( MailDrop drop : store.getMailDrops() ) {
                    File index = drop.getIndex(indexName);
                    if( !drop.isReadOnly() && index.exists() ) {
                        readers.add( IndexReader.open(
FSDirectory.open(index), true ) );
                    }
                }
                reader = new MultiReader( readers.toArray( new
IndexReader[readers.size()]) );
            }
            return reader;
        }
    }

    public boolean isIndexWritable() {
        for( MailDrop drop : store.getMailDrops() ) {
            File index = drop.getIndex(indexName);
            if( !drop.isReadOnly() && index.exists() ) {
                return true;
            }
        }
        return false;
    }

    public void close() throws IOException {
        closeWriter();
        closeSearcher();
    }

    public void commit() throws IOException {
        try {
            if( writer != null ) {
                if( logger.isDebugEnabled() ) logger.debug("Committing
changes to the index " + indexName );
                writer.commit();
            }
        } catch( OutOfMemoryError e ) {
            logger.error("Out of memory while committing index.  Index will
be closed: " + indexName, e);
            closeWriter();
        }
    }

    private void closeWriter() throws IOException {
        synchronized( writerLock ) {
            try {
                if( writer != null ) {
                    logger.debug("Closing the writer for " + indexName);
                    writer.close();
                }
                writerLock.notifyAll();
            } finally {
                writer = null;
            }
        }
    }

    private void closeSearcher() throws IOException {
        synchronized( searchLock ) {
            if( searcher != null && activeSearches.get() == 0 ) {
                try {
                    logger.debug( "Closing the searcher for " + indexName );
                    searcher.close();
                } finally {
                    searcher = null;
                }
            }
            searchLock.notifyAll();
        }
    }

    public void closeReader() throws IOException, InterruptedException {
        synchronized( writerLock ) {
            if( reader != null ) {
                try {
                    reader.close();
                } finally {
                    reader = null;
                }
                optimize();
            }
            writerLock.notifyAll();
        }
    }
}

Just so you are aware of what my problem was which I don't think I was
doing anything incorrectly here.  I had some code in getSearcher() doing
the following:

    protected IndexSearcher getSearcher() throws IOException {
        synchronized( searchLock ) {
            if( searcher == null || ( activeSearches.get() == 0 &&
!searcher.getIndexReader().isCurrent() ) ) {   <<<<<< problem
                if( searcher != null ) {
                    logger.debug("Closing the searcher for " + indexName);
                    searcher.close();
                }
                List<IndexReader> readers = new ArrayList<IndexReader>();
                for( MailDrop drop : store.getMailDrops() ) {
                    File index = drop.getIndex(indexName);
                    if( index.exists() ) {

readers.add(IndexReader.open(FSDirectory.open(index), true));
                    }
                }
                searcher = new IndexSearcher(new MultiReader(
readers.toArray( new IndexReader[readers.size()]), true ));
                searcherTimestamp = System.currentTimeMillis();
            }
            return searcher;
        }
    }

That check isCurrentReader() would sometimes say the Index was out of date,
so it closed it, and reopened it.  I changed it to only use commit() and
never close it and it doesn't leak files.  But, I'm not seeing the changes.
 What I don't understand is why calling close() would leak files.  I've
double checked that close() was definitely being called with my logs.
 Again this code was the same in 2.4 and it didn't leak files, but under
3.1 it leaked files.

Thanks for you help,

Charlie

On Fri, Jan 6, 2012 at 3:06 PM, Ian Lea <ian.lea@gmail.com> wrote:

> Something that did change at some point, can't remember when, was the
> way that discarded but not explicitly closed searchers/readers are
> handled.  I think that they used to get garbage collected, causing
> open files to be closed, but now need to be explicitly closed.  Sounds
> to me like you are opening new searchers/readers without closing old
> ones.
>
>
> --
> Ian.
>
>
> On Fri, Jan 6, 2012 at 6:50 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> > Can you show the code? In particular are you re-opening
> > the index writer?
> >
> > Bottom line: This isn't a problem anyone expects
> > in 3.1 absent some programming error on your
> > part, so it's hard to know what to say without
> > more information.
> >
> > 3.1 has other problems if you use spellcheck.collate,
> > you might want to upgrade if you use that feature
> > to at least 3.3. But I truly believe that this is irrelevant
> > to your problem.
> >
> > Best
> > Erick
> >
> >
> > On Fri, Jan 6, 2012 at 1:25 PM, Charlie Hubbard
> > <charlie.hubbard@gmail.com> wrote:
> >> Thanks for the reply.  I'm still having trouble.  I've made some
> changes to
> >> use commit over close, but I'm not seeing much in terms of changes on
> what
> >> seems like ever increasing open file handles.  I'm developing on Mac OS
> X
> >> 10.6 and testing on Linux CentOS 4.5.  My biggest problem is I can't
> tell
> >> why lsof is saying this process has this many open files.  I'm seeing
> >> repeated files being opened more than once, and I'm seeing files
> showing up
> >> in lsof output that don't exist on the file system.  For example here is
> >> the lucene directory:
> >>
> >> -rw-r--r-- 1 root root  328396   Jan   5 20:21      _ly.fdt
> >>> -rw-r--r-- 1 root root    6284    Jan    5 20:21      _ly.fdx
> >>> -rw-r--r-- 1 root root    2253    Jan    5 20:21      _ly.fnm
> >>> -rw-r--r-- 1 root root  234489  Jan  5 20:21         _ly.frq
> >>> -rw-r--r-- 1 root root   15704   Jan   5 20:21        _ly.nrm
> >>> -rw-r--r-- 1 root root 1113954 Jan  5 20:21         _ly.prx
> >>> -rw-r--r-- 1 root root    5421 Jan    5 20:21          _ly.tii
> >>> -rw-r--r-- 1 root root  445988 Jan   5 20:21          _ly.tis
> >>> -rw-r--r-- 1 root root  118262 Jan   6 09:56          _nx.cfs
> >>> -rw-r--r-- 1 root root   10009 Jan   6 10:00           _ny.cfs
> >>> -rw-r--r-- 1 root root      20 Jan     6 10:00           segments.gen
> >>> -rw-r--r-- 1 root root     716 Jan    6 10:00           segments_kw
> >>
> >>
> >> And here is an excerpt from: lsof -p 19422 | awk -- '{print $9}' | sort
> >>
> >> ...
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lp.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lq.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lr.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_ls.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
> >> /usr/local/emailarchive/mailarchive/lucene/indexes/mail/_lt.cfs
> >> ...
> >>
> >> As you can see none of those files actually exist.  Not only that, but
> they
> >> are opened 8 or 9 times.  There are tons of these non-existant
> repeatedly
> >> open files in the output.  So why are the handles being counted as being
> >> open?
> >>
> >> I have a single IndexWriter and a single IndexSearcher open on a single
> CFS
> >> directory.  The writer is only used by a single thread, but
> IndexSearcher
> >> can be shared among several threads.  I still think something has
> changed
> >> in 3.1 that's causing this.  I hope you can help me understand how it's
> not.
> >>
> >> Charlie
> >>
> >> On Mon, Jan 2, 2012 at 3:03 PM, Simon Willnauer <
> >> simon.willnauer@googlemail.com> wrote:
> >>
> >>> hey charlie,
> >>>
> >>> there are a couple of wrong assumptions in your last email mostly
> >>> related to merging. mergefactor = 10 doesn't mean that you are ending
> >>> up with one file neither is it related to files. Yet, my first guess
> >>> is that you are using CompoundFileSystem (CFS) so each segment
> >>> corresponds to a single file. The merge factor relates to segments and
> >>> is responsible for triggering segment merges by their  size (either in
> >>> bytes or in documents). For more details see this blog:
> >>>
> >>>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> >>>
> >>> If you are using CFS one segment is one file. In 3.1 CFS is only used
> >>> if the target segment is less than the nonCFSRatio. That prevents the
> >>> usage of CFS for segments that are bigger than a fraction of the
> >>> existing index to be packed into CFS (by default 0.1 -> 10%)
> >>>
> >>> this means your index might create non-cfs segments with multiple
> >>> files (10 in the worst case.... maybe I missed one but anyway...)
> >>> which means the number of open files increases.
> >>>
> >>> This is only a guess since I don't know what you are doing with your
> >>> index readers etc. Which platform are you one and what is the file
> >>> descriptor limit? In general its ok to raise the FD limit on your OS
> >>> and just let lucene do its job. if you are restricted in any way you
> >>> can set the LogMergePolicy#setNoCFSRatio(double) to 1.0 and see you
> >>> your are still seeing the problem.
> >>>
> >>> About commit vs. close - in general its not a good idea to close your
> >>> IW at all. I'd keep it open as long as you can and commit if needed.
> >>> Even optimize is somewhat overrated and should be used with care or
> >>> not at all... (here is another writeup regarding optimize:
> >>>
> >>>
> http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you
> >>> )
> >>>
> >>>
> >>> hope that helps,
> >>>
> >>> simon
> >>>
> >>>
> >>> On Mon, Jan 2, 2012 at 5:38 PM, Charlie Hubbard
> >>> <charlie.hubbard@gmail.com> wrote:
> >>> > I'm beginning to think there is an issue with 3.1 that's causing
> this.
> >>> >  After looking over my code again I forgot that the mechanism that
> does
> >>> the
> >>> > indexing hasn't changed, and the index IS being closed between
> cycles.
> >>> >  Even when using push vs pull.  This code used to work on 2.x lucene,
> >>> but I
> >>> > had to upgrade it.  It had been very stable under 2.x, but after
> >>> upgrading
> >>> > to 3.1 I've started seeing this problem.  I double checked the code
> doing
> >>> > the indexing, and it hasn't changed since I upgraded to 3.1.  So the
> >>> > constant in this equation is mostly my code.  What's different is
> 3.1.
> >>> >  Furthermore, when new documents are pulled in through the
> >>> > old mechanism the open file count continues to rise.  Over a 24 hours
> >>> > period it's grown by +296 files, but only 10 or 12 documents indexed.
> >>> >
> >>> > So is this a known issue?  Should I upgrade to newer version to fix
> this?
> >>> >
> >>> > Thanks
> >>> > Charlie
> >>> >
> >>> > On Sat, Dec 31, 2011 at 1:01 AM, Charlie Hubbard
> >>> > <charlie.hubbard@gmail.com>wrote:
> >>> >
> >>> >> I have a program I recently converted from a pull scheme to a push
> >>> scheme.
> >>> >>  So previously I was pulling down the documents I was indexing,
and
> >>> when I
> >>> >> was done I'd close the IndexWriter at the end of each iteration.
>  Now
> >>> that
> >>> >> I've converted to a push scheme I'm sent the documents to index,
> and I
> >>> >> write them.  However, this means I'm not closing the IndexWriter
> since
> >>> >> closing after every document would have poor performance.  Instead
> I'm
> >>> >> keeping the IndexWriter open all the time.  Problem is after a
> while the
> >>> >> number of open files continues to rise.  I've set the following
> >>> parameters
> >>> >> on the IndexWriter:
> >>> >>
> >>> >> merge.factor=10
> >>> >> max.buffered.docs=1000
> >>> >>
> >>> >> After going over the api docs I thought this would mean it'd never
> >>> create
> >>> >> more than 10 files before merging those files into a single file,
> but
> >>> it's
> >>> >> creating 100's of files.  Since I'm not closing the IndexWriter
> will it
> >>> >> merge the files?  From reading the API docs it sounded like merging
> >>> happens
> >>> >> regardless of flushing, commit, or close.  Is that true?  I've
> measured
> >>> the
> >>> >> files that are increasing, and it's files associated with this
one
> index
> >>> >> I'm leaving open.  I have another index that I do close
> periodically,
> >>> and
> >>> >> its not growing like this one.
> >>> >>
> >>> >> I've read some posts about using commit() instead of close() in
> >>> situations
> >>> >> like this because its faster performance.  However, commit() just
> >>> flushes
> >>> >> to disk rather than flushing and optimizing like close().  Not
sure
> >>> >> commit() is what I need or not.  Any suggestions?
> >>> >>
> >>> >> Thanks
> >>> >> Charlie
> >>> >>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message