lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
Date Mon, 12 Apr 2010 11:26:41 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855924#action_12855924
] 

Shai Erera commented on LUCENE-2386:
------------------------------------

I don't think that people need to write that "emptiness-detection-then-commit" code ... if
they care, they can simply immediately call commit() after they open IW.

bq. Isn't opening IW with CREATE* mode called "specifically asking for"?

It depends on how you interpret the mode ... for example, you cannot pass OpenMode.APPEND
for an empty Directory, because IW throws an exception. The modes are just meant to tell IW
how to behave:
* APPEND - I know there is an index in the Directory, and I'd like to append to it.
* CREATE - I don't care if there is an index in the Directory -- create a new one, zeroing
out all segments.
* CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one.

So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until
you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()?
That's just an inconsistency that's meant to allow you to open an IR immediately after "new
IW()" call, irregardless of what was there? And if you open that IR, then if the index was
populated you see the previous set of documents, but if it wasn't you see nothing, even though
you meant to say "override what's there"?

I've checked what FileOutputStream does, using the following code:
{code}
File file = new File("d:/temp/tmpfile");
FileOutputStream fos = new FileOutputStream(file);
fos.write(3);
fos.close();
	  
fos = new FileOutputStream(file);
FileInputStream fis = new FileInputStream(file);
System.out.println(fis.read());
{code}

* Second line creates an empty file immediately, not waiting for close() or flush() -- which
resembles the behavior that you're suggesting we should take w/ IW (which is the 'today's
behavior')
* Forth line closes the file, flushing and writing the content.
* Fifth line *recreates* the file, empty, again, w/o calling close. So it zeros out the file
content immediately, even before you wrote a single piece of byte to it.
* Sixth+Seventh line proves it by attempting to read from the file, and the output printed
is -1.

I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm trying to
show is that we don't fully adhere to the CREATE mode, and rightfully if you ask me - we shouldn't
zero out the segments until the application called commit(). But we choose to adhere differently
to the CREATE* mode if the index is already populated. That's an inconsistent behavior, at
least in my perspective. It's also harder to explain and document, e.g. "you should call commit()
if you used CREATE, in case you want to zero out everything immediately, and the Directory
is not empty, but you don't need to call commit() if the directory was empty, Lucene will
do it for you." -- so now how will the app know if it should call commit()? It will need to
write a sort of "emptiness-detection-then-commit"?

I am willing to consider the following semantics:
* APPEND - assumes an index exists and open it.
* CREATE - zeros out everything that's in the directory *immediately*, and also prepares an
empty directory.
* CREATE_OR_APPEND - either loads an existing index, or is able to work on the empty directory.
No implicit commit is happening by IW if the index does not exist.

But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed change to the
patch so far -- if you open an index in CREATE*, you should call commit before you can read
it. That will adhere to the semantics of what the application wanted, whether it meant to
zero out an existing Directory, or create a new one from scratch.

> IndexWriter commits unnecessarily on fresh Directory
> ----------------------------------------------------
>
>                 Key: LUCENE-2386
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2386
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch,
LUCENE-2386.patch
>
>
> I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory
is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings
back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect
people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true?
If they want, they can simply call commit() right away on the IW they created.
> I ran into this when writing a test which committed N times, then compared the number
of commits (via IndexReader.listCommits) and was surprised to see N+1 commits.
> Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on
me .. so the change might not be that simple. But I think it's manageable, so I'll try to
attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message