lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (LUCENE-3560) add extra safety to concrete codec implementations
Date Sat, 05 Nov 2011 15:40:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144724#comment-13144724
] 

Uwe Schindler edited comment on LUCENE-3560 at 11/5/11 3:39 PM:
----------------------------------------------------------------

bq. I'd like to extend an existing codec to add one file to files() - bummer, I have to reimplement
the whole codec now

The abstract base class Codec is as stupid simple as Analyzer. There is no logic in it, it
just defines the following:
- name of codec (which cannot be changed by subclassing!!!)
- factory methods for the format readers/writers of the different parts of an index (postings,
stored fields, segments file,...)

If you want to create a new codec, you have to simply write this wrapper with a new name,
otherwise SPI won't work.

The first point in the above list is the real bummer in your "I only want to add one file"
approach. If you would subclass the codec, the name cannot change anymore. This name is written
to the index format. When you open IndexReader, it reads the name and uses Codec.forName()
to lookup the codec. As the name was not changed in your subclass it would then use the superclass
to read the index -> #fail
                
      was (Author: thetaphi):
    bq. I'd like to extend an existing codec to add one file to files() - bummer, I have to
reimplement the whole codec now

The abstract base class Codec is as stupid simple as Analyzer. There is no logic in it, it
just defines the following:
- name of codec (which cannot be changed by subclassing!!!)
- factory methods for the format readers/writers of the different parts of an index (postings,
stored fields, segments file,...)

If you want to create a new codec, you have to simply write this wrapper with a new name,
otherwise SPI won't work.
                  
> add extra safety to concrete codec implementations
> --------------------------------------------------
>
>                 Key: LUCENE-3560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3560
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3560.patch
>
>
> In LUCENE-3490, we reorganized the codec model, and a key part of this is that Codecs
are "safer"
> and don't rely upon client-side configuration: IndexReader doesn't take Codec or anything
of that 
> nature, only IndexWriter.
> Instead for "read" all codecs are initialized from the classpath via a no-arg ctor from
Java's 
> Service Provider Mechanism.
> So, although Codecs can still take parameters in the constructors, be subclassable, etc
(for passing
> to IndexWriter), this enforces that they must write any configuration information they
need into
> the index, so that we don't have a flimsy API.
> I think we should go even further, for additional safety. Any methods on our concrete
codecs that
> are not intended to be subclassed should be final, and we should add assertions to verify
this.
> For example, SimpleText's files() implementation should be final. If you want to make
an extension
> of simpletext that has additional files, then this is a different index format and should
have a
> different name!
> Note: This doesn't stop extensibility, only stupid mistakes. 
> For example, this means that Lucene40Codec's postingsFormat() implementation is final,
even though 
> it offers a configurable "hook" (getPostingsFormatForField) for you to specify per-field
postings 
> formats (which it writes into a .per file into the index, so that it knows how to read
each field).
> {code}
> private final PostingsFormat postingsFormat = new PerFieldPostingsFormat() {
>   @Override
>   public PostingsFormat getPostingsFormatForField(String field) {
>     return Lucene40Codec.this.getPostingsFormatForField(field);
>   }
> };
> ...
> @Override
> public final PostingsFormat postingsFormat() {
>   return postingsFormat;
> }
> ...
>   /** Returns the postings format that should be used for writing 
>    *  new segments of <code>field</code>.
>    *  
>    *  The default implementation always returns "Lucene40"
>    */
>   public PostingsFormat getPostingsFormatForField(String field) {
>     return defaultFormat;
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message