lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mmoser <>
Subject Re: Sorting by a percentage On a field
Date Sat, 16 Dec 2006 05:09:27 GMT

Thanks for your reply. We have currently thought about both of these
approaches, so that definitely makes me feel better about things. The first
approach you had mentioned, we had thought about our tagging problem and how
to make a product tag come to the top, but again, with a lot of tags, the
data becomes extremely large. Even if we were to take some magnitude and
apply it, it could possibly be huge due to an infinite number of tags.

The second approach we had considered doing a version almost like that. I
had considered that if we have a query such as the one you are talking about
mixed with other fields, then the query would be extremely big. Especially
if the user was looking for 15 attribute ids which would be 15 * 10 = 150
different ORed term searches alone  if we were only searching every 10

I definitely thank you for this input and would love to hear if anyone has
any other approaches. If not, we will probably attempt one of these routes
and see which one is a bigger storage / performance hit.


Doron Cohen wrote:
> I think the right solution for this would use "payloads", where extra data
> can be added for each index token. However Lucene currently does not
> support this. Without this I can think of two options, each with its own
> disadvantage:
> 1) more tokens at indexing time - decide on the resolution of the
> percentage - say it is 5% - and add more tokens of the same. For example,
> the attributes field for product A in your example would look like: "2 2 2
> 2 2 2 2 2 2 2 5 5 5 5 5 5 3 3 13 13 13 13 13 13".
> 2) more tokens at search time - at indexing, include the percentage in the
> token. So for product A you would have: "2x50 5x30 3x10 13x30". At search
> time, expand the query accordingly. So the query for attribute 5 would be
> expanded to: "5x5^5 5x10^10 5x15^15 5x20^20 ... 5x95^95 5x100^100".
> The first approach would enlarge the index, so if you have lots of data
> that could eventually be a problem.
> The second approach would end up with a large query, so, again, if you
> have
> lots of data that could eventually be a problem with search time.
> Also, depending how strict you want the scoring to be, you may want to
> omit
> norms for this field.
> Hope this helps,
> Doron
> mmoser <> wrote on 15/12/2006 13:17:05:
>> So, I am still new to Lucene, so please take this into consideration when
>> reading this. Up until now, a novice like myself has been able to finagle
>> Lucene into doing what we want. But now we have a problem that I have
> been
>> searching for the answer to. We allow users to profile our products with
> a
>> predetermined profile attribute id. We then want to take all the users
>> profiles on a product and take a particular number of times that this
>> particular profile attribute id has been chosen and come out with a
>> percentage for it. This is no problem. Where the problem comes into play
> is
>> that we want the user to be able to search for products that match that
>> particular profile attribute id. We want the higher percentages to come
> up
>> on top. To add to the complexity, we want to be able to allow for the
> user
>> to select multiple profile attribute ids and still have a combination of
> the
>> score to come up higher. Keep in mind, we would like to somehow keep
> these
>> in one field, because we are trying to use the same algorithm for
> something
>> that could potentially become very large. Any suggestions. The more
> detail,
>> the better.
>> Example:
>> Product A
>> Attribute ID = 2    Percentage Chosen = 50%
>> Attribute ID = 5    Percentage Chosen = 30%
>> Attribute ID = 3    Percentage Chosen = 10%
>> Attribute ID = 13    Percentage Chosen = 30%
>> Product B
>> Attribute ID = 1    Percentage Chosen = 50%
>> Attribute ID = 2    Percentage Chosen = 20%
>> Attribute ID = 3    Percentage Chosen = 75%
>> So if a user selected the attributes that correspond to 2 and 3, then
>> Product B should show up before Product A because it has a combined score
> of
>> 95% and A has a combined score of 40%.
>> Thanks for any help.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message