lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: More frustration with Lucene/Java file i/o on Windows
Date Fri, 18 Aug 2006 19:17:21 GMT
Hi, Mark,

I had the same issue with Lucene when maintaining Lucene index on
Windows. It's mostly due to Windows OS can not delete a file
correctly. While some versioning can alivate the problem somehow,  my
advice is to move to Linux according to my experience.

Regarding the lock, you need to delete all locks before the indexing starts.

Regarding the leftover index, you can create a new index in a new
directory and copy the new index to your target directory.

Chris Lu
------------------------------------
Lucene Search Server for Any Databases/Applications
http://www.dbsight.net

On 8/18/06, Mark Modrall <MModrall@glgroup.com> wrote:
> Hi...
>
>
>
>             It was a little comforting to know that other people have
> seen Windows Explorer refreshes crash java Lucene on Windows.  We seem
> to be running into a long list of file system issues with Lucene, and I
> was wondering if other people had noticed these sort of things (and
> hopefully any tips and tricks for working around them).
>
>
>
>             We've got a process running as a Windows service that's
> trying to keep a set of Lucene indexes up-to-date from a database.  The
> corpus is pretty small, so we copy the last index build to a temp
> directory and then try to do an incremental index of the changes on the
> working copy.  Since our software is evolving, we've put a version
> number in the meta data of the index files which we check when we're
> starting up.  If the version numbers don't match, we scrap the whole
> thing and start over.  The problem is that java Lucene on Windows
> doesn't do "scrap" very well.
>
>
>
>             The current hypothesis is that File.renameTo, File.delete,
> and other jvm operations on windows fail if there is any other handle
> open on the file and that Lucene objects aren't closing/finalizing their
> file handles cleanly/reliably so other things blow up later.
>
>
>
>             Here's the chain of events we had in our service process
> last night:
>
> 20060817T165611.682,EDT [Indexer.java 813]: Exception occurred deleting
> document 183971: Lock obtain timed out:
> Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock
>
> 20060817T165612.682,EDT [Indexer.java 813]: Exception occurred deleting
> document 257265: Lock obtain timed out:
> Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock
>
> 20060817T165613.744,EDT [Indexer.java 1184]: Indexing failure, db
> changes will be rolled back and partial index deleted.
>
> java.io.IOException: Lock obtain timed out:
> Lock@C:\WINDOWS\TEMP\lucene-22b8462f0f541160a41abfdff8d52f94-write.lock
>
>             at org.apache.lucene.store.Lock.obtain(Lock.java:56)
>
>             at
> org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:489
> )
>
>             at
> org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:514)
>
>             at
> org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:541
> )
>
>             at ...Indexer.buildIncrementalIndex(Indexer.java:915)
>
> ...
>
>
>
> Why it couldn't get that temporary lock file, I don't know.  The same
> process runs continuously, and I don't know if Lucene reuses the same
> tmpnames from run to run.  If the files were left around because of
> these JVM File system errors, maybe that would explain it.
>
>
>
> The "partial index deleted" part of our message means that we did a
> recursive (java) delete of all the index directories we were working
> with.  Our attempt to clean the slate got everything but
> contact_index\_3zhe.cfs.
>
>
>
> An hour later, we come back and try to start another index build.  We
> find the directory still exists, so we try to validate the index version
> number using
>
>             searcher = new
> IndexSearcher(FSDirectory.getDirectory(indexFile, false));
>
>             TermQuery tq = new TermQuery(new
> Term(METADATA_DOCUMENT_FIELD, METADATA_DOCUMENT_FIELD_VALUE));
>
>             Hits h = searcher.search(tq);
>
>             if (h.length() == 1)
>
>             ...
>
>         finally
>
>         {
>
>             if (searcher != null)
>
>             {
>
>                 try { searcher.close(); } catch (Exception e) { /*
> ignore */ }
>
>             }
>
>         }
>
>
>
> Obviously with only the _3zhe.cfs file left, it's not a valid index, so
> the attempt to get the metadata fails.  No matter what happens, we do a
> searcher.close().  My suspicion is that IndexSearch.close() isn't really
> doing a File.close() on all the files it's using, so you have to wait
> until searcher is garbage collected and its file objects finalized
> before things will work - because immediately after this check, Lucene
> fails a full build with
>
> 20060817T175614.323,EDT [Indexer.java 843]: Error building full index
>
> java.io.IOException: Cannot delete
> \\xx.xx.xx.xx\indexbuild2\contact_index\_3zhe.cfs
>
>             at
> org.apache.lucene.store.FSDirectory.create(FSDirectory.java:198)
>
>             at
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:144)
>
>             at
> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:224)
>
> ...
>
>
>
> Doing a full build, Lucene does it's own attempt to clear out the
> leftovers which fails because it can't delete the file.  And we're stuck
> in this loop all night.
>
>
>
> The guy who wrote the Indexing code says the version check is the only
> place in the code where we have an IndexSearcher created using a file
> path string - that all others have a pre-existing IndexReader.  He wants
> me to try it that way instead, so that we can explicitly close the
> reader and hopefully clear that loose file handle.
>
>
>
> Sorry for the long-winded vent, but does anyone have any advice for
> getting java lucene working on windows?  Any idea why it would seize up
> on the lock files?  This service process is the only lucene process on
> the system, and the finished indexes are copied off to another server to
> serve the search requests, so it's puzzling that the daemon process
> would block itself...  Anyone know if an IndexReader.close() would do a
> better job of cleaning up the file handles than IndexSearcher.close()?
>
>
>
> Thanks
>
> -Mark
>
>
> This e-mail message, and any attachments, is intended only for the use of the individual
or entity identified in the alias address of this message and may contain information that
is confidential, privileged and subject to legal restrictions and penalties regarding its
unauthorized disclosure and use. Any unauthorized review, copying, disclosure, use or distribution
is strictly prohibited. If you have received this e-mail message in error, please notify the
sender immediately by reply e-mail and delete this message, and any attachments, from your
system. Thank you.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message