Mailing-List: contact general-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@lucene.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Date: Wed, 10 Mar 2010 18:02:30 -0800 (PST)
From: Chris Hostetter <hossman_lucene@fucit.org>
To: general@lucene.apache.org
Subject: Re: How to do prefix/phrase matching with term-length-sensitive
 scoring?
In-Reply-To: <1267630059.30186.2.camel@seraphim>
Message-ID: <Pine.LNX.4.64.1003101757390.6808@radix.cryptio.net>
References: <1267630059.30186.2.camel@seraphim>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


: Given a list of prefixes, what is the simplest way to match them against
: a text field, giving preference to shorter term matches?

I would suggest using Edge based NGrams, sorting on a numeric field 
containing the "length" of the term.

:  * Term frequency within the field must be ignored when scoring.

You can omit term frequeny info when indexing (sorting will make it 
irrelevent, but no reason 
to waste the space)

:  * Documents and fields are sometimes boosted at index time; norms are
: present.

Hmmm, well that makes the sorting more complicated, but in that case you 
can either include the boost value into your special "length" field to 
have your own magic number for sorting the results, or you use a function 
query based approach to meld the (norm influenced) score with your own 
length field.


-Hoss