incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] Dynamic document boost
Date Sat, 11 Feb 2012 23:37:47 GMT
On Sat, Feb 11, 2012 at 11:27:10PM +0100, Nick Wellnhofer wrote:
> Thanks for pointing me to RequiredOptionalQuery. It looks very useful.
>
> I can't model the query to identify the subset directly in Lucy. The  
> subset is computed by some other code, so I think I'll end up with an  
> ORQuery with about 100 terms matching a StringType field containing an  
> external document id.

OK, that sounds like the right way to go.  Really big ORQueries can bog down,
but 100 terms, all of which are rare, that's not so bad.

>>> Is there a better way than to simply retrieve all the results, apply the
>>> boost factor manually to the scores and sort the results again?
>>
>> I hope you don't have to resort to post-search filtering.  That's slow to
>> begin with and it doesn't scale very well because of the costs of retrieving
>> so many documents.  You also have to resort to non-idiomatic sorting code
>> (using a priority queue rather than the Perl sort() function) if you don't
>> want memory usage to balloon.
>
> It wouldn't be too bad in my use case because the number of results is  
> limited. But I'm curious what the most scalable solution would look like.

I only mean that post-search sorting doesn't scale nearly as well as sorting
during the main search -- in terms of CPU cycles, i/o, or memory.  Sorting
during the main search uses a priority queue, and the sort caches we build at
index time are extrememly efficient.

Marvin Humphrey


Mime
View raw message