lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas den Braber" <tho...@delos.nl>
Subject Re: [lucy-user] Hits offset and search performarce
Date Mon, 12 Nov 2012 09:53:59 GMT
On Sun, Nov 11, 2012 at 04:19 AM, Marvin Humphrey <marvin@rectangular.com> wrote:

> I don't know how Swish-e implements sorting of hits, but this is expected
> behavior in Lucy.

Swish-e can use presorting of attributes during indexing:
'By default Swish-e generates presorted tables while indexing for each property name. This
allows faster sorting when generating results. On large document collections this
presorting may add to the indexing time, and also adds to the total size of the index.
This directive can be used to customize exactly which properties will be presorted.'

Maybe this does the trick ?

>> I would expect that using the offset, performance should be higher because
>> no processing needs to be done to the hits before the offset (no score
>> calculation).

> How do you know that the hit number 5000 actually ranks 5000th in sort order
> unless you calculate scores for all documents and perform sorting?

> There are certain times when Lucy can avoid calculating scores -- when
> SortSpecs do not require scores, or when documents match pure negative clauses
> (docs matching "bar" in the query `foo AND NOT bar`).  But when you are
> ranking documents based on score, we have to calculate a score for **every**
> document.

Sorry I didn't mention this but I really meant sorting by attributes other the score, like
modification date or file size. Is calculating of the score also needed here?

> I would assume that Swish-e and Lucy are implemented differently.  I don't
> know what seek() does in the context of Swish-e.

Seek will fast forward through the search result without first specifying the total hits
you want to collect and not reading the results that exists before the seek pointer. In
swish you also do not have to say in advance how many hits you want.

I can overcome the absence of such a command in Lucy by tweaking my program and moving
some of my logic to an earlier stage.

I will continue my migration and will let you know if there are 'more bumps on the road'.

I can also make a more detailed performance comparison if you like.

Thomas



Mime
View raw message