lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4339) Allow deletions on Lucene3x codecs again
Date Wed, 29 Aug 2012 19:44:08 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444350#comment-13444350
] 

Shai Erera commented on LUCENE-4339:
------------------------------------

I don't think that users complaining or not should affect our back-compat policy. I suspect
many people trying 4.0-* didn't yet deploy it in production, at least not large productions.
So we cannot let that affect our decisions. I'm pretty sure that in the Big Data world, it's
not going to be accepted that you need to re-index the data, or run a migration tool over
indexes that may be TBs in size, just because it may be hard to support deleting documents
from older (1 version back only though !) segments.

In short, let's postpone decisions such as "you must run an upgrader tool" until a real problem
comes up in one of the future releases. There's no need to make such decision at this point.
                
> Allow deletions on Lucene3x codecs again
> ----------------------------------------
>
>                 Key: LUCENE-4339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4339
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0-BETA
>            Reporter: Uwe Schindler
>            Priority: Blocker
>             Fix For: 4.0
>
>         Attachments: LUCENE-4339.patch
>
>
> On dev@lao Hoss reported that a user in Solr was not able to update or delete documents
in his 3.x index with Solr 4:
> {quote}
> On the solr-user list, Dirk Högemann recently mentioned a problem he was seeing when
he tried upgrading his existing solr setup from 3.x to 4.0-BETA.  Specifically this exception
getting logged...
> http://find.searchhub.org/document/cdb30099bfea30c6
> auto commit error...:java.lang.UnsupportedOperationException: this codec can only be
used for reading
>          at org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
>          at org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
>          at org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
>          at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
>          at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
>          at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
>          at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
>          at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
>          at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
>          at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
>          at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
> Dirk was able to work arround this by completely re-indexing, but it seemed strange to
me that this would happen.
> My understanding is that even though an IndexUpgrader tool was now available, it wasn't
going to be required for users to use it when upgrading from 3.x to 4.x.  Explicitly upgrading
the index format might be a good idea, and might make hte index more performant, but as I
understood it, the way things had been implemented with codecs explicitly upgrading the index
format wasn't strictly neccessary, and that users should be able to upgrade their lucene apps
same way that was supported with other index format upgrades in the past: the old index can
be read, and as changes are made new segments will be re-written in the new format.  (Note
in
> particular: at the moment we don't mention IndexUpgrader in MIGRATE.txt at all.)
> It appears however, based on this stack trace and some other experiements i tried, that
any attempts to "delete" documents in a segment that is using the Lucene3xCodec will fail.
> This seems like a really scary time bomb sitaution, because if you upgrade, things will
seem to be working -- you can even add documents, and depending on the order that you do things,
some "old" segments may get merged and use the new format, so *some* deletes of "old" documents
(in those merged/upgraded) segments may work, but then somewhere down the road, you may try
to a delete that affects docs in a still un-merge/upgraded segment, and that delete will fail
-- 5 minutes later, if another merge has happened, attempting to do the exact same delete
may succeed.
> All of which begs the question: is this a known/intended limitation of the Lucene3xCodec,
or an oversight in the Lucene3xCodec?
> if it's expected, then it seems like we should definitely spell out this limitation in
MIGRATE.txt and advocate either full rebuilds, or the use of IndexUpgrader for anyone who's
indexes are non-static.
> On the Solr side of things, i think we should even want to consider automaticly running
IndexUpgrader on startup if we detect that the Lucene3xCodec is in use to simplify things
-- we can't even suggest running "optimize" as a quick/easy way to force and index format
upgrade because if the 3x index as already optimized then it's a no-op and the index stays
in the 3x format.
> {quote}
> Robert said, that this is a wanted limitation (in fact its explicitely added to the code,
without that UOE it "simply works"), but I disagree here and lots of other people:
> {quote}
> In the early days (I mean in the time when it was already read only until we refactored
the IndexReader.delete()/Codec stuff), this was working, because the LiveDocs were always
handled in a special way. Making it now 100% read-only is in my opinion very bad, as it does
not allow to update documents in a 3.x index anymore, so you have no chance, you must run
IndexUpgrader. 
> The usual step like opening old Index and adding documents works (because the new documents
are added always to new segment), but the much more usual IW.updateDocument() which is commonly
used also to add documents fails on old Indexes. This is a no-go, we have to fix this. If
we allow the trick with updating LiveDocs on 3.x codec, for the end-user the "read-only" stuff
in Lucene3x codec would be completely invisible, as he can do everything IndexWriter provides.
The other horrible things like changing norms is no longer possible, so deletes are the only
thing that affects here. The read-only ness of Lucene3x codec would only be visible to the
user when someone tries to explicitly create an index with Lucene3x codec. And I understood
the CHANGES/MIGRATE.txt exactly as that.
> {quote}
> On the list, Robert added a simple patch, reverting the UOE in Lucene3xCodec, so the
LiveDocs format is RW again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message