lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Boosting results
Date Fri, 07 Nov 2008 17:46:17 GMT

This is a good point.

Sorting populates the field cache (internal to Lucene) for that field,  
meaning it loads all values for all docs and holds them in memory.   
This makes the first query slow, and, consumes RAM, in proportion to  
how large your index is.

Whereas boosting should be able to achieve the use case without these  
limitations.

Mike

Matthew DeLoria wrote:

> This actually brings up an interesting question, and something I  
> have been
> curious about.
>
> In this case, does it make more sense to do Boosting by Category, or  
> to do
> sorting? From what I understand, Lucene sorting involves putting the
> relevant fields into memory, and then executing a sort.
>
> Is this how sorting actually works in Lucene? If so, is it even a  
> good idea
> considering the large data sets in Lucene? What would really be the
> difference between sorting and boosting?
>
> M
>
> On Fri, Nov 7, 2008 at 7:59 AM, Erick Erickson <erickerickson@gmail.com 
> >wrote:
>
>> duuuuh, sorting. I absolutely love it when I overlook the obvious  
>> <G>.
>>
>> Erick@ThatSlappingSoundYouHearIsMyHandAgainstMyForehead
>>
>> On Fri, Nov 7, 2008 at 4:58 AM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>>
>>> Couldn't you just do a single Query that sorts first by category and
>> second
>>> by relevance?
>>>
>>> Mike
>>>
>>>
>>> Erick Erickson wrote:
>>>
>>> It seems to me that the easiest thing would be to fire two queries  
>>> and
>>>> then just concatenate the results
>>>>
>>>> category:A AND body:fred
>>>>
>>>> category:B AND body:fred
>>>>
>>>>
>>>> If you really, really didn't want to fire two queries, you could  
>>>> create
>>>> filters on category A and category B and make a couple of
>>>> passes through your results seeing if the returned documents were  
>>>> in
>>>> the filter, but you'd still concatenate the results. Actually in  
>>>> your
>>>> specific example you could make one filter on A.....
>>>>
>>>> You could also consider a custom scorer that, added 1,000,000 to  
>>>> every
>>>> category A document.
>>>>
>>>> How much were you boosting by? What happens if you boost by a  
>>>> very large
>>>> factor?
>>>> As in ridiculously large?
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Thu, Nov 6, 2008 at 7:42 PM, Scott Smith <ssmith@mainstreamdata.com
>>>>> wrote:
>>>>
>>>> I'm interested in comments on the following problem.
>>>>>
>>>>>
>>>>>
>>>>> I have a set of documents.  They fall into 3 categories.  Call  
>>>>> these
>>>>> categories A, B, and C.  Each document has an indexed, non- 
>>>>> tokenized
>>>>> field called "category" which contains A, B, or C (they are  
>>>>> mutually
>>>>> exclusive categories).
>>>>>
>>>>>
>>>>>
>>>>> All of the documents contain a field called "body" which  
>>>>> contains a
>>>>> bunch of text.  This field is indexed and tokenized.
>>>>>
>>>>>
>>>>>
>>>>> So, I want to do a search which looks something like:
>>>>>
>>>>>
>>>>>
>>>>> (category:A OR category:B) AND body:fred
>>>>>
>>>>>
>>>>>
>>>>> I want all of the category A documents to come before the  
>>>>> category B
>>>>> documents.  Effectively, I want to have the category A documents  
>>>>> first
>>>>> (sorted by relevancy) and then the category B documents after  
>>>>> (sorted
>> by
>>>>> relevancy).
>>>>>
>>>>>
>>>>>
>>>>> I thought I could do this by boosting the category portion of the
>> query,
>>>>> but that doesn't seem to work consistently.  I was setting the  
>>>>> boost on
>>>>> the category A term to 1.0 and the boost on the category B term  
>>>>> to 0.0.
>>>>>
>>>>>
>>>>>
>>>>> Any thoughts how to skin this?
>>>>>
>>>>>
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
>
>
> -- 
> Matthew P. DeLoria
> matthew.deloria@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message