lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Modrall" <MModr...@glgroup.com>
Subject RE: More frustration with Lucene/Java file i/o on Windows
Date Fri, 18 Aug 2006 21:53:55 GMT
Hi Mike,

	I do appreciate the thoroughness and graciousness of your
responses, and I hope there's nothing in my frustration that you would
take personally.  Googling around, I've found other references to the
sun jvm handling of the Windows file system to be, well, quixotic at
best.

	In our current system, we have two modes of operation, full
index recreation and incremental indexing.  Which to use is determined
by a quick validate check (check to see if the path exists, see if it is
a directory.  If it is, make an IndexSearcher to check the meta data as
below.  If the reader passes the test, build incremental; otherwise
delete the directory and start fresh
  searcher = new IndexSearcher(FSDirectory.getDirectory(indexFile,
false));
  TermQuery tq = new TermQuery(new Term(METADATA_DOCUMENT_FIELD,
METADATA_DOCUMENT_FIELD_VALUE));
  Hits h = searcher.search(tq);
).

	The validation IndexSearcher gets closed in a finally block, so
there shouldn't be anything left over from that.

	If it's a full rebuild, we just have an IndexWriter (no reader).
If it's incremental, there's an IndexReader to delete old documents,
which is closed, followed by an IndexWriter that is also closed (when
things go well).

	I haven't gone looking in the source to figure out what goes
into the middle of the lucene-<xxx>-write.lock naming convention, but as
you say they could have been left over from some abnormal termination.

	Our indexing schema bats back and forth between 2 build dirs;
one's supposed to be the last successful build, the other is the one you
can work on.  When a successful build is finished, all the files are
copied over into the scratch dir and the next build goes in the scratch
dir.  If part of the glorp in the lock file name is a hash of the
directory path, we could run for a while and not hit the locking issue
for a couple of builds.

	I still can't figure out how the .cfs file delete would fail,
though, unless the IndexSearcher.close() hadn't really let go of the
file.  What would happen with an IndexSearcher on a malformed directory?
I.e. if there was only a .cfs file there?  Would .close() know to
release the one handle it had?

	Anyway, I'll implement something at the root to delete the lock
files before starting to do anything to make sure the slate is clean and
cross my fingers.

Thanks
-Mark




 
This e-mail message, and any attachments, is intended only for the use of the individual or
entity identified in the alias address of this message and may contain information that is
confidential, privileged and subject to legal restrictions and penalties regarding its unauthorized
disclosure and use. Any unauthorized review, copying, disclosure, use or distribution is strictly
prohibited. If you have received this e-mail message in error, please notify the sender immediately
by reply e-mail and delete this message, and any attachments, from your system. Thank you.


-----Original Message-----

From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Friday, August 18, 2006 3:11 PM
To: java-user@lucene.apache.org
Subject: Re: More frustration with Lucene/Java file i/o on Windows


>             It was a little comforting to know that other people have
> seen Windows Explorer refreshes crash java Lucene on Windows.  We seem
> to be running into a long list of file system issues with Lucene, and
I
> was wondering if other people had noticed these sort of things (and
> hopefully any tips and tricks for working around them).

Sorry you're having so many troubles.  Keep these emails, questions &
issues coming because this is how we [gradually] fix Lucene to be more
robust!

OK a few quick possibilities / suggestions:

   * Make sure in your Indexer.java that when you delete docs, you
     close any open IndexWriter's before you try to call
     deleteDocuments from your IndexReader.  Only one writer
     (IndexWriter adding docs or IndexReader deleting docs) can be open
     at once and if you fail to do this you'll get exactly that "lock
     obtain timed out" error.  You could also use IndexModifier which
     under the hood is doing this open-close logic for you.  But: try
     to buffer up adds and deletes together if possible to minimize
     cost of open/closes.

   * That one file really seems to have an open file handle on it.  Are
     you sure you called close on all IndexReaders (IndexSearchers)?
     That file is a "compound file format" segment, and IndexReaders
     hold an open file handle to these files (IndexWriters do as well,
     but they quickly close the file handles after writing to them).

   * There was a thread recently, similar to this issue, where
     File.renameTo was failing, and there was a suggestion that this is
     a bug in some JVMs and to get the JVM to GC (System.gc()) to see
     if that then closes the underlying file.

   * IndexSearcher.close() will only close the underlying IndexReader
     if you created it with a String.  If you create it with just an
     IndexReader it will not close that reader.  You have to separately
     call IndexReader.close to close the reader.

   * If the JVM exited un-gracefully then the lock files will be left
     on disk and Lucene will incorrectly think the lock is held by
     another process (and then hit that "lock obtain timed out"
     error).  You can just remove the lock files (from
     c:\windows\temp\...) if you are certain no Lucene processes are
     running.

     We are working towards using native locks in Lucene (for a future
     release) so that even un-graceful exits of the JVM will properly
     free the lock.

   * Perhaps, change your "build a new index" logic so that it does so
     in an entirely fresh directory?  Just to avoid any hazards at all
     of anything holding files open in the old directory ...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message