lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (LUCENE-3082) Add index upgrade method to IndexWriter to force an upgrade of all segments to last recent supported index format without optimizing
Date Sun, 08 May 2011 18:31:03 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030519#comment-13030519
] 

Uwe Schindler edited comment on LUCENE-3082 at 5/8/11 6:29 PM:
---------------------------------------------------------------

Patch that implements this with a merge policy:

It does not yet contain the command line updater, if you want to upgrade an old index, the
API code to do this is very simple:

{code:java}
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_XX, new KeywordAnalyzer());
iwc = iwc.setMergePolicy(new UpgradeIndexMergePolicy(iwc.getMergePolicy()));
IndexWriter w = new IndexWriter(dir, iwc);
w.optimize();
w.close();
{code}

The patch contains new tests in TestBackwards that verify the upgrade process:

- It tries to upgrade all old indexes from the well-known list in TestBackwards. When this
is done, all of them should contain exactly one segment (because all segments previously in
index are older version, so they are merged/optimized together in new format). It also verifies
all segment versions to be Constants.LUCENE_MAIN_VERSION.
- It tries to upgrade two old, already optimized indexes (with prev version, I changed TestBackwards
in my 3.1 checkout to generate those). It verifies the segment versions after the upgrade.
This special case is needed, as optimizing a one-segment index is a no-op without the special
merge-policy
- It uses the old optimized indexes, opens them using standard merge policy and adds some
documents to them. After that it upgrades the index with a new IndexWriter using the special
merge policy. In that case (as some segments are already in new version), the index should
only have the old-segments merged together, the newly added ones are untouched. So segment
is verified to be count > 1.

      was (Author: thetaphi):
    Path that implements this with a merge policy:

It does not yet contain the command line updater, if you want to upgrade an old index, the
API code to do this is very simple:

{code:java}
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_XX, new KeywordAnalyzer());
iwc = iwc.setMergePolicy(new UpgradeIndexMergePolicy(iwc.getMergePolicy()));
IndexWriter w = new IndexWriter(dir, iwc);
w.optimize();
w.close();
{code}

The patch contains new tests in TestBackwards that verify the upgrade process:

- It tries to upgrade all old segments in the well-known list. When this is done, all of them
should contain exactly one segment (because all segments previously in index are older version,
so they are merged/optimized together in new format). It also verifies all segment versions
to be Constants.LUCENE_MAIN_VERSION.
- It tries to upgrade two old, already optimized indexes (with prev version, I changed TestBackwards
in my 3.1 checkout to generate those). It verifies the segment versions after the upgrade.
This special case is needed, as optimizing a one-segment index is a no-op without the special
merge-policy
- It uses the old optimized indexes, opens them using standard merge policy and adds some
documents to them. After that it upgrades the index, in that case (as some segments are already
in new version), the index should only have the old-segments merged together, the newly added
ones are untouched. So segment count > 1
  
> Add index upgrade method to IndexWriter to force an upgrade of all segments to last recent
supported index format without optimizing
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3082
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3082
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3082.patch, index.31.optimized.cfs.zip, index.31.optimized.nocfs.zip
>
>
> Currently if you want to upgrade an old index to the format of your current Lucene version,
you have to optimize your index or use addIndexes(IndexReader...) [see LUCENE-2893] to copy
to a new directory. The optimize() approach fails if your index is already optimized.
> I propose to add a method to IndexWriter thats similar to optimize(), that uses a custom
MergePolicy to upgrade all segments to the last format. This MergePolicy could simply also
ignore all segments already up-to-date. All segments in prior formats would be merged to a
new segment. The tool could optionally also optimize the index.
> This issue is different from LUCENE-2893, as it would only support upgrading indexes
from previous Lucene versions in-place using the official path. Its a tool for the end user,
not a developer tool.
> This addition should also go to Lucene 3.x, as we need to make users with pre-3.0 indexes
go the step through 3.x, else they would not be able to open their index with 4.0. With this
tool in 3.x the users could safely upgrade their index without relying on optimize to work
on already-optimized indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message