Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 46394 invoked from network); 28 Aug 2006 04:30:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Aug 2006 04:30:43 -0000 Received: (qmail 89088 invoked by uid 500); 28 Aug 2006 04:30:41 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 88776 invoked by uid 500); 28 Aug 2006 04:30:40 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 88765 invoked by uid 99); 28 Aug 2006 04:30:39 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Aug 2006 21:30:39 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Aug 2006 21:30:38 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E2DD941000C for ; Mon, 28 Aug 2006 04:27:23 +0000 (GMT) Message-ID: <14138520.1156739243926.JavaMail.jira@brutus> Date: Sun, 27 Aug 2006 21:27:23 -0700 (PDT) From: "Doron Cohen (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-665) temporary file access denied on Windows In-Reply-To: <24993893.1156548203672.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-665?page=comments#action_12430919 ] Doron Cohen commented on LUCENE-665: ------------------------------------ > just to confirm, is it the COMMIT lock that's throwing these > unhandled exceptions (not the WRITE lock)? > If so, lockless commits would fix this. In my tests so far, these errors appeared only for commit locks. However I consider this a coincidence - there is nothing as far as I can understand special with commit locks comparing to write locks - in particular they both use createNewFile. So, I agree that lockless commits would prevent this, which is good, but we cannot count on that it would not happen for write locks as well. Also, the more I think about it the more I like lock-less commits, still, they would take a while to get into Lucene, while this simple fix can help easily now. Last, with lock-less commits, still, there would be calls to createNewFile for write lock, and there would be calls to renameFile() and other IO file operations, intensively. By having a safety code like the retry logic that is invoked only in rare cases of these unexpected, some nasty errors would be reduced, more users would be happy. > Can you provide more details on the exceptions you're seeing? > Especially on the "cannot rename file" exception? Here is one from my run log, that occurs at the call to optimize, after at the end of all the add-remove iterations - [junit] java.io.IOException: Cannot rename C:\Documents and Settings\tpowner\Local Settings\Temp\test.perf\index_24\deleteable.new to C:\Documents and Settings\tpowner\Local Settings\Temp\test.perf\index_24\deletable [junit] at org.apache.lucene.store.FSDirectory.doRenameFile(FSDirectory.java:328) [junit] at org.apache.lucene.store.FSDirectory.renameFile(FSDirectory.java:280) [junit] at org.apache.lucene.index.IndexWriter.writeDeleteableFiles(IndexWriter.java:967) [junit] at org.apache.lucene.index.IndexWriter.deleteSegments(IndexWriter.java:911) [junit] at org.apache.lucene.index.IndexWriter.commitChanges(IndexWriter.java:872) [junit] at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:823) [junit] at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:798) [junit] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:614) [junit] at org.apache.lucene.index.IndexModifier.optimize(IndexModifier.java:304) [junit] at org.apache.lucene.index.TestBufferedDeletesPerf.doOptimize(TestBufferedDeletesPerf.java:266) [junit] at org.apache.lucene.index.TestBufferedDeletesPerf.measureInterleavedAddRemove(TestBufferedDeletesPerf.java:218) [junit] at org.apache.lucene.index.TestBufferedDeletesPerf.doTestBufferedDeletesPerf(TestBufferedDeletesPerf.java:144) [junit] at org.apache.lucene.index.TestBufferedDeletesPerf.testBufferedDeletesPerfCase7(TestBufferedDeletesPerf.java:134) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:585) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567) [junit] Caused by: java.io.FileNotFoundException: C:\Documents and Settings\tpowner\Local Settings\Temp\test.perf\index_24\deletable (Access is denied) [junit] at java.io.FileOutputStream.open(Native Method) [junit] at java.io.FileOutputStream.(FileOutputStream.java:179) [junit] at java.io.FileOutputStream.(FileOutputStream.java:131) [junit] at org.apache.lucene.store.FSDirectory.doRenameFile(FSDirectory.java:312) [junit] ... 27 more This exception btw is from the performance test for interleaved-adds-and-removes - issue 565 - so IndexWriter line numbers here relate to applying recent patch from issue 565 (though the same errors are obtained with the svn head of IndexWriter). > It may make more sense to trap "Access Denied" in the lock.obtain, > but then translate this into "the lock was not acquired" (ie, just return 0). > Because, above this code is the retry logic for the lock > (which pauses by default for 1.0 sec). It is true that when the lock cannot be obtained the existing retry logic in Lock.java could handle it. But when you come to think of it, this is not the purpose of that Lock retry logic - that was for the case that the lock is *really* acquired by someone else, and we want to stay around for a while to try again. This is not the case here, although the symptoms are similar. Masking this error would not be a good idea. I think it is better for the code in FSDirectory to throw the exception if the retry fails as well (as currently in this patch), and let Lock.java apply its retry logic also for an IOException. If again, the retry of Lock class fails, it would be again problematic to hide the exception. > I'm having trouble reproducing this issue. I copied the > TestInterleavedAddAndRemoves.java into src/test/org/apache/lucene/index, > then ran the test directly using "java org.junit.runner.JUnitCore > org.apache.lucene.index.TestInterleavedAddAndRemoves", > using a clean checkout of the current Lucene HEAD. > The test is still running and is quite far along and I haven't hit any of the above errors. > > I'm running on Windows XP SP2, Sun JDK 1.5.0_07. I wonder if SP1 vs SP2 makes the difference? > > Could you also try [temporarily] turning off any virus / malware scanning tools? > I wonder if you have one that's doing "live" checking and hold files open? > (Though, I have a virus scanner running and it's not causing problems...). I'm not sure here. I am also running with svn head. I am trying again now, after I turned off anti-virus, and disabled Windows indexing (though the service was already off), and disabled an afs client service that was running. I will report here if the errors happen again. But I am not sure how this should affect decision on applying this fix - there would always be user machines out there running Lucene and also running other services. We could tell users - hey, make sure that none of the other services / software running on your machine is holding / touching / examining Lucene index files, otherwise, don't blame Lucene - but this is not easily done. Not all developers out there have control or understanding of what's running on their machines - some programs are installed by a system support, you know how it is. So, while it is understandable that Lucene would fail if there is a malicious software that actually grabs and holds Lucene files and interfere with them (for "long" periods of times), it would be nice to keep these failures at minimum. > I would like to reproduce this so I could test it against my fixes for lock-less commits! The performance test case for 565 is a more aggressive test in this regard - it produced more of these errors for me, including rename() errors. To run it, apply the most recent patch from http://issues.apache.org/jira/browse/LUCENE-565 - that would be NewIndexWriter.Aug23.patch. Notice that the run time (at least on my machine) is over 6 hours... I ran it btw with ant test, after modifying junit.includes in build.xml to run my test. > temporary file access denied on Windows > --------------------------------------- > > Key: LUCENE-665 > URL: http://issues.apache.org/jira/browse/LUCENE-665 > Project: Lucene - Java > Issue Type: Bug > Components: Store > Affects Versions: 2.0.0 > Environment: Windows > Reporter: Doron Cohen > Attachments: FSDirectory_Retry_Logic.patch, Test_Output.txt, TestInterleavedAddAndRemoves.java > > > When interleaving adds and removes there is frequent opening/closing of readers and writers. > I tried to measure performance in such a scenario (for issue 565), but the performance test failed - the indexing process crashed consistently with file "access denied" errors - "cannot create a lock file" in "lockFile.createNewFile()" and "cannot rename file". > This is related to: > - issue 516 (a closed issue: "TestFSDirectory fails on Windows") - http://issues.apache.org/jira/browse/LUCENE-516 > - user list questions due to file errors: > - http://www.nabble.com/OutOfMemory-and-IOException-Access-Denied-errors-tf1649795.html > - http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html > - discussion on lock-less commits http://www.nabble.com/Lock-less-commits-tf2126935.html > My test setup is: XP (SP1), JAVA 1.5 - both SUN and IBM SDKs. > I noticed that the problem is more frequent when locks are created on one disk and the index on another. Both are NTFS with Windows indexing service enabled. I suspect this indexing service might be related - keeping files busy for a while, but don't know for sure. > After experimenting with it I conclude that these problems - at least in my scenario - are due to a temporary situation - the FS, or the OS, is *temporarily* holding references to files or folders, preventing from renaming them, deleting them, or creating new files in certain directories. > So I added to FSDirectory a retry logic in cases the error was related to "Access Denied". This is the same approach brought in http://www.nabble.com/running-a-lucene-indexing-app-as-a-windows-service-on-xp%2C-crashing-tf2053536.html - there, in addition to the retry, gc() is invoked (I did not gc()). This is based on the *hope* that a access-denied situation would vanish after a small delay, and the retry would succeed. > I modified FSDirectory this way for "Access Denied" errors during creating a new files, renaming a file. > This worked fine for me. The performance test that failed before, now managed to complete. There should be no performance implications due to this modification, because only the cases that would otherwise wrongly fail are now delaying some extra millis and retry. > I am attaching here a patch - FSDirectory_Retry_Logic.patch - that has these changes to FSDirectory. > All "ant test" tests pass with this patch. > Also attaching a test case that demostrates the problem - at least on my machine. There two tests cases in that test file - one that works in system temp (like most Lucene tests) and one that creates the index in a different disk. The latter case can only run if the path ("D:" , "tmp") is valid. > It would be great if people that experienced these problems could try out this patch and comment whether it made any difference for them. > If it turns out useful for others as well, including this patch in the code might help to relieve some of those "frustration" user cases. > A comment on state of proposed patch: > - It is not a "ready to deploy" code - it has some debug printing, showing the cases that the "retry logic" actually took place. > - I am not sure if current 30ms is the right delay... why not 50ms? 10ms? This is currently defined by a constant. > - Should a call to gc() be added? (I think not.) > - Should the retry be attempted also on "non access-denied" exceptions? (I think not). > - I feel it is somewhat "woodoo programming", but though I don't like it, it seems to work... > Attached files: > 1. TestInterleavedAddAndRemoves.java - the LONG test that fails on XP without the patch and passes with the patch. > 2. FSDirectory_Retry_Logic.patch > 3. Test_Output.txt- output of the test with the patch, on my XP. Only the createNewFile() case had to be bypassed in this test, but for another program I also saw the renameFile() being bypassed. > - Doron -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org