Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 95176 invoked from network); 12 Apr 2010 11:27:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Apr 2010 11:27:09 -0000 Received: (qmail 79858 invoked by uid 500); 12 Apr 2010 11:27:08 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 79790 invoked by uid 500); 12 Apr 2010 11:27:07 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 79432 invoked by uid 99); 12 Apr 2010 11:27:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Apr 2010 11:27:06 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Apr 2010 11:27:02 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o3CBQfQL024823 for ; Mon, 12 Apr 2010 07:26:41 -0400 (EDT) Message-ID: <7384316.20591271071601134.JavaMail.jira@thor> Date: Mon, 12 Apr 2010 07:26:41 -0400 (EDT) From: "Shai Erera (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory In-Reply-To: <386970335.20021270756057764.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855924#action_12855924 ] Shai Erera commented on LUCENE-2386: ------------------------------------ I don't think that people need to write that "emptiness-detection-then-commit" code ... if they care, they can simply immediately call commit() after they open IW. bq. Isn't opening IW with CREATE* mode called "specifically asking for"? It depends on how you interpret the mode ... for example, you cannot pass OpenMode.APPEND for an empty Directory, because IW throws an exception. The modes are just meant to tell IW how to behave: * APPEND - I know there is an index in the Directory, and I'd like to append to it. * CREATE - I don't care if there is an index in the Directory -- create a new one, zeroing out all segments. * CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one. So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()? That's just an inconsistency that's meant to allow you to open an IR immediately after "new IW()" call, irregardless of what was there? And if you open that IR, then if the index was populated you see the previous set of documents, but if it wasn't you see nothing, even though you meant to say "override what's there"? I've checked what FileOutputStream does, using the following code: {code} File file = new File("d:/temp/tmpfile"); FileOutputStream fos = new FileOutputStream(file); fos.write(3); fos.close(); fos = new FileOutputStream(file); FileInputStream fis = new FileInputStream(file); System.out.println(fis.read()); {code} * Second line creates an empty file immediately, not waiting for close() or flush() -- which resembles the behavior that you're suggesting we should take w/ IW (which is the 'today's behavior') * Forth line closes the file, flushing and writing the content. * Fifth line *recreates* the file, empty, again, w/o calling close. So it zeros out the file content immediately, even before you wrote a single piece of byte to it. * Sixth+Seventh line proves it by attempting to read from the file, and the output printed is -1. I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm trying to show is that we don't fully adhere to the CREATE mode, and rightfully if you ask me - we shouldn't zero out the segments until the application called commit(). But we choose to adhere differently to the CREATE* mode if the index is already populated. That's an inconsistent behavior, at least in my perspective. It's also harder to explain and document, e.g. "you should call commit() if you used CREATE, in case you want to zero out everything immediately, and the Directory is not empty, but you don't need to call commit() if the directory was empty, Lucene will do it for you." -- so now how will the app know if it should call commit()? It will need to write a sort of "emptiness-detection-then-commit"? I am willing to consider the following semantics: * APPEND - assumes an index exists and open it. * CREATE - zeros out everything that's in the directory *immediately*, and also prepares an empty directory. * CREATE_OR_APPEND - either loads an existing index, or is able to work on the empty directory. No implicit commit is happening by IW if the index does not exist. But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed change to the patch so far -- if you open an index in CREATE*, you should call commit before you can read it. That will adhere to the semantics of what the application wanted, whether it meant to zero out an existing Directory, or create a new one from scratch. > IndexWriter commits unnecessarily on fresh Directory > ---------------------------------------------------- > > Key: LUCENE-2386 > URL: https://issues.apache.org/jira/browse/LUCENE-2386 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Reporter: Shai Erera > Assignee: Shai Erera > Fix For: 3.1 > > Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch > > > I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. > I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. > Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org