lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <>
Subject [jira] Updated: (LUCENE-1261) Impossible to use custom norm encoding/decoding
Date Sat, 17 May 2008 02:31:55 GMT


Otis Gospodnetic updated LUCENE-1261:

    Priority: Minor  (was: Major)
    Assignee: Otis Gospodnetic

John, do you want to keep this open, or can I close this now that Karl is making progress
on LUCENE-1260?

> Impossible to use custom norm encoding/decoding
> -----------------------------------------------
>                 Key: LUCENE-1261
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3.1
>         Environment: All
>            Reporter: John Adams
>            Assignee: Otis Gospodnetic
>            Priority: Minor
> Although it is possible to override methods encodeNorm and decodeNorm in a custom Similarity
class, these methods are not actually used by the query processing and scoring functions,
not by the indexing functions. The relevant Lucene classes all call "Similarity.decodeNorm"
rather than "similarity.decodeNorm", i.e. the norm encoding/decoding is fixed to use that
of the base Similarity class. Also index writing classes such as DocumentWriter use "Similarity.decodeNorm"
rather than "similarity.decodeNorm", so we are stuck with the 3 bit mantissa encoding implemented
by SmallFloat.floatToByte315 and SmallFloat.byte315ToFloat.
> This is very restrictive and annoying, since in practice many users would prefer an encoding
that allows finer distinctions for boost and normalisation factors close to 1.0. For example.
SmallFloat.floatToByte52 uses 5 bits of mantissa, and this would be of great help in distinguishing
much better between subtly different lengthNorms and FieldBoost/DocumentBoost values.
> It hsould be easy to fix this by changing all instances of "Similarity.decodeNorm" and
"Similarity.encodeNorm" to "similarity.decodeNorm" and "similarity.encodeNorm" in the Lucene
code (there are only a few of each).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message