lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3126) IndexWriter.addIndexes can make any incoming segment into CFS if it isn't already
Date Mon, 23 May 2011 19:00:48 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038136#comment-13038136
] 

Michael McCandless commented on LUCENE-3126:
--------------------------------------------


bq. Patch does not handle all files well (few tests fail). Apparently, the .del file should
not be rolled into the .cfs.

Right, .del files never appear inside a CFS.

bq. SegmentMerger.createCompoundFile does this by default, however it's only called from code
that ensures no deletions exist. Would have been nice if this method documented it .

Please add comments to this!  It's non-obvious ;)

bq. Also, I think *.s<num> should not be rolled into .cfs (those are the separate norms
files). I don't know how to create such files in the first place (thought they're of old format,
but 3.1 indexes have them also), and TestBackCompat fails.

Right, these too only live outside a CFS.  You create them by opening a writable IndexReader
(I know: confusing!) and calling setNorm, then closing it.  They are not only for old indices...
4.0 creates them too.

bq. Is there a way to identify those files? Is it safe to check if the file extension starts
w/ IndexFileNames.SEPARATE_NORMS_EXTENSION? Feels hacky to me.

Hackish though it seems (I agree) I think that's the only way?  SegmentInfo.hasSeparateNorms
is equally hacky...

bq. Another thing, I think in order to avoid shared doc stores (and whatever other old-format)
stuff, since it's only an optimization, that the code should copy into CFS only if the segment
version is on or after 3.1 (that is StringHelper.getVersionComparator().compare(info.getVersion,
"3.1") >= 0).

Shared doc stores, yes, but the separate del docs / norms are produced by all versions.

More generally: does addIndexes properly refuse to import a too-old index?  We should throw
IndexFormatTooOldExc in this case?  (And, maybe also IndexFormatTooNewExc?).


> IndexWriter.addIndexes can make any incoming segment into CFS if it isn't already
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-3126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3126
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3126.patch
>
>
> Today, IW.addIndexes(Directory) does not modify the CFS-mode of the incoming segments.
However, if IndexWriter's MP wants to create CFS (in general), there's no reason why not turn
the incoming non-CFS segments into CFS. We anyway copy them, and if MP is not against CFS,
we should create a CFS out of them.
> Will need to use CFW, not sure it's ready for that w/ current API (I'll need to check),
but luckily we're allowed to change it (@lucene.internal).
> This should be done, IMO, even if the incoming segment is large (i.e., passes MP.noCFSRatio)
b/c like I wrote above, we anyway copy it. However, if you think otherwise, speak up :).
> I'll take a look at this in the next few days.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message