lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1908) Similarity javadocs for scoring function to relate more tightly to scoring models in effect
Date Mon, 14 Sep 2009 22:07:57 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755219#action_12755219
] 

Doron Cohen commented on LUCENE-1908:
-------------------------------------

{quote}
The rationale behind the coarseness of the norms is that since the accuracy of
search engines in retrieving the documents that the user really wants is so
poor, only big differences matter.  (It's not just poor "recall" against a
given query, but the difficulty that the user experiences in formulating a
proper query to express what they're really looking for in the first place.)

Doug wrote at least once about this some years back, but I haven't been
able to track down the post.
{quote}

Thanks!  I too failed to find that post.

I like the part about users difficulty to express their information need in the query.

So I am updating like this:

{noformat}
However the resulted norm value is encoded as a single byte before being 
stored. At search time, the norm byte value is read from the index directory 
and decoded back to a float norm value. This encoding/decoding, while reducing 
index size, comes with the price of precision loss - it is not guaranteed that 
decode(encode(x)) = x. For instance, decode(encode(0.89)) = 0.75. 
 
Compression of norm values to a single byte saves memory at search time, 
because once a field is referenced at search time, its norms - for all 
documents - are maintained in memory. 
 
The rationale supporting such lossy compression of norm values is that 
given the difficulty (and inaccuracy) of users to express their true information 
need by a query, only big differences matter. 
 
Last, note that search time is too late to modify this norm part of scoring, 
e.g. by using a different Similarity for search. 
{noformat}

> Similarity javadocs for scoring function to relate more tightly to scoring models in
effect
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1908
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1908
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1908.patch, LUCENE-1908.patch, LUCENE-1908.patch, LUCENE-1908.patch
>
>
> See discussion in the related issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message