lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <>
Subject [jira] Commented: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10
Date Thu, 24 Sep 2009 01:53:16 GMT


Lance Norskog commented on LUCENE-1360:

Is this code still interesting? That is, are there newer tools in Lucene that handle this

I have found searching movie titles, product descriptions etc. difficult to manage really
well. Mainstream text retrieval research & applied tech is very strongly biased towards
bodies of text.

> A Similarity class which has unique length norms for numTerms <= 10
> -------------------------------------------------------------------
>                 Key: LUCENE-1360
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Sean Timm
>            Assignee: Otis Gospodnetic
>            Priority: Trivial
>         Attachments:
> A Similarity class which extends DefaultSimilarity and simply overrides lengthNorm. 
lengthNorm is implemented as a lookup for numTerms <= 10, else as {{1/sqrt(numTerms)}}.
This is to avoid term counts below 11 from having the same lengthNorm after stored as a single
byte in the index.
> This is useful if your search is only on short fields such as titles or product descriptions.
> See mailing list discussion:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message