lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Baby steps towards making Lucene's scoring more flexible...
Date Mon, 15 Mar 2010 12:49:46 GMT
>>> But I don't like baking in search concepts at index time...
>>
> Many scoring models are possible if you store enough stats in the
> index.
>

in general the missing stats seem to fit in two buckets/categories:

1) length normalization pivot: average length in bytes, terms, unique terms
2) term frequency normalization factor: max or average tf for the field.

you never need more than one of each category for the same field. one
approach would be for the search-time similarity to simply use these
generic names (i guess they could get some placeholder value if they
are not available) and at index time, you make sure you put the one
you want (or none at all) in the "bucket"


-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message