lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Gimeno <>
Subject Re: Custom scoring algorithm
Date Fri, 13 Nov 2009 17:09:46 GMT
Hi again.

I've made a proof of concept using the boost factor. I have done the
following: add a field for each feature and put the field boost factor
as the feature value.

	private static void addDocument(String id, Map<String, Integer>
features, IndexWriter writer) throws IOException {
		Document doc = new Document();

		doc.add(new Field("id", id,
                Field.Store.YES, Field.Index.NO));
		for (String key : features.keySet()) {
			Field field = new Field(key, key,
	                Field.Store.YES, Field.Index.ANALYZED);

But I don't know if this is the best way of doing this.


On Fri, Nov 13, 2009 at 3:42 PM, Alberto Gimeno <> wrote:
> Hi.
> I am developing an application and I would like to add searching
> capabilities. I have a database with items. Each item has a number of
> "features" with a numeric value. Example: feature_x=100,
> feature_y=200. Items can have common or different "features". And they
> can have a variable number of "features" as well. In the whole
> application can be hundreds of different features. I need users to be
> able to make queries by features and I need the results to be sorted
> as the sum of those features. For example if a user looks for "x, y,
> z", the first result should be the item with the greatest
> (features_x+feature_y+feature_z).
> I think Lucene can be a solution. But I don't need some of its
> features such as Analyzers, Filters, Tokenizers... I think I should
> implement my own scoring algorithm. How can I do that in the easiest
> possible way? I have read something about payloads. Can they be useful
> for my needings?
> My first attempt was to generate dumb strings containing each feature
> name repeated as many times as the feature value. Example: "x, x, x,
> x, y, y, z, z". Of course I know this is a very poor solution. And I
> also have seen that it doesn't work as I expected because the default
> scoring algorithm is much more complex than just counting words.
> Thank you very much in advance.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message