lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <>
Subject Re: Sorting by a percentage On a field
Date Sat, 16 Dec 2006 01:03:01 GMT
I think the right solution for this would use "payloads", where extra data
can be added for each index token. However Lucene currently does not
support this. Without this I can think of two options, each with its own

1) more tokens at indexing time - decide on the resolution of the
percentage - say it is 5% - and add more tokens of the same. For example,
the attributes field for product A in your example would look like: "2 2 2
2 2 2 2 2 2 2 5 5 5 5 5 5 3 3 13 13 13 13 13 13".

2) more tokens at search time - at indexing, include the percentage in the
token. So for product A you would have: "2x50 5x30 3x10 13x30". At search
time, expand the query accordingly. So the query for attribute 5 would be
expanded to: "5x5^5 5x10^10 5x15^15 5x20^20 ... 5x95^95 5x100^100".

The first approach would enlarge the index, so if you have lots of data
that could eventually be a problem.
The second approach would end up with a large query, so, again, if you have
lots of data that could eventually be a problem with search time.

Also, depending how strict you want the scoring to be, you may want to omit
norms for this field.

Hope this helps,

mmoser <> wrote on 15/12/2006 13:17:05:
> So, I am still new to Lucene, so please take this into consideration when
> reading this. Up until now, a novice like myself has been able to finagle
> Lucene into doing what we want. But now we have a problem that I have
> searching for the answer to. We allow users to profile our products with
> predetermined profile attribute id. We then want to take all the users
> profiles on a product and take a particular number of times that this
> particular profile attribute id has been chosen and come out with a
> percentage for it. This is no problem. Where the problem comes into play
> that we want the user to be able to search for products that match that
> particular profile attribute id. We want the higher percentages to come
> on top. To add to the complexity, we want to be able to allow for the
> to select multiple profile attribute ids and still have a combination of
> score to come up higher. Keep in mind, we would like to somehow keep
> in one field, because we are trying to use the same algorithm for
> that could potentially become very large. Any suggestions. The more
> the better.
> Example:
> Product A
> Attribute ID = 2    Percentage Chosen = 50%
> Attribute ID = 5    Percentage Chosen = 30%
> Attribute ID = 3    Percentage Chosen = 10%
> Attribute ID = 13    Percentage Chosen = 30%
> Product B
> Attribute ID = 1    Percentage Chosen = 50%
> Attribute ID = 2    Percentage Chosen = 20%
> Attribute ID = 3    Percentage Chosen = 75%
> So if a user selected the attributes that correspond to 2 and 3, then
> Product B should show up before Product A because it has a combined score
> 95% and A has a combined score of 40%.
> Thanks for any help.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message