lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0
Date Thu, 01 Dec 2011 01:41:40 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160559#comment-13160559
] 

Robert Muir commented on LUCENE-3606:
-------------------------------------

{quote}
finally, "holy grail" where similarities can declare the normalization factor(s) they need,
using byte/float/int whatever, and its all unified with the docvalues api. IndexReader.norms()
maybe goes away here, and maybe NormsFormat too.
{quote}

Thinking about this: a clean way to do it would be for Similarity to get a new method:
{code}
ValueType getValueType();
{code}

and we would change:
{code}
byte computeNorm(FieldInvertState state);
{code}
to:
{code}
void computeNorm(FieldInvertState state, PerDocFieldValues norm);
{code}

Sims that want to encode multiple index-time scoring factors separately 
could just use BYTES_FIXED_STRAIGHT. This should be only for some rare
sims anyway, because a Sim can pull named 'application' specific scoring
factors from IR.perDocValues() today already.

Its not too crazy either since sims are already doing their own encoding,
so e.g. default sim would just use FIXED_INTS_8.

People that don't want to mess with bytes or smallfloat could use things
like FLOAT_32 if they want and need this.

we would just change FieldInfo.omitNorms to instead be FieldInfo.normValueType,
which is the value type of the norm (null if its omitted, just like docValueType).

Preflex FieldInfosReader would just set FIXED_INTS_8 or null, based on
whether the fieldinfos had omitNorms or not. it doesnt support
any other types... 

Finally then, sims would be own their scoring factors, and we could
even remove omitNorms from Field/FieldType etc (just use the correct 
scoring algorithm for the field, if you don't want norms, use a sim
that doesn't need them for scoring)

This would remove the awkward/messy situation where every similarity 
implementation we have has to 'downgrade' itself to handle things like
if the user decided to omit parts of their formula!
                
> Make IndexReader really read-only in Lucene 4.0
> -----------------------------------------------
>
>                 Key: LUCENE-3606
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3606
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>
> As we change API completely in Lucene 4.0 we are also free to remove read-write access
and commits from IndexReader. This code is so hairy and buggy (as investigated by Robert and
Mike today) when you work on SegmentReader level but forget to flush in the DirectoryReader,
so its better to really make IndexReaders readonly.
> Currently with IndexReader you can do things like:
> - delete/undelete Documents -> Can be done by with IndexWriter, too (using deleteByQuery)
> - change norms -> this is a bad idea in general, but when we remove norms at all and
replace by DocValues this is obsolete already. Changing DocValues should also be done using
IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message