jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: query performance
Date Wed, 23 Jun 2010 09:55:52 GMT
On Wed, Jun 23, 2010 at 11:43 AM, JOSE FELIX HERNANDEZ BARRIO
<jose.hernandez@isthari.com> wrote:

>>
>> how long does the second time take, for example when you search on
>> 'B22222%' order by test:Id.
>
> you're ok the second time it's really fast (0.06s)
>
>
>> The first time, Lucene has to read all
>> terms for test:Id into memory, which can take some time (also
>> depending on your FS and whether FS caches are warm). Anyways, if all
>> your 1.000.000 nodes contain a title, all have to be read into memory
>> for sorting. After the first time, this is cached and it should be
>> fast.
>>
>
> but my question is, only 1000 records matches the where (test:Id like
> 'B11111%') why is necessary to read every record and not only the 1000 of
> the resultset which are the ones to be sorted ??
> it takes the same time to read and sort 1000 record than sort the hole
> repository and takes the 100 first results !!

This is the way how Lucene, and I think a quite general concept of
inverted indexes, store their contents on filesystem. Fetching the
terms through the matched documents is much slower then just fetching
all the terms for some field. Also, I think this is out of scope for
this list: it is part of Lucene. It's only slow the very first time.
If that is really a problem you could try to use SSD or make sure that
FS caches stay warmed up.

Regards Ard

>
>
>>
>> Regards Ard
>>
>> >
>> > the same problem with sql and jcr-sql2
>> >
>> > any tip ?
>> > is this a bug in lucene search?
>> >
>> > --
>> > Jose Hernandez
>> > 675599600
>> > Isthari
>> > http://www.isthari.com
>> >
>>
>
>
>
> --
> Jose Hernandez
> 675599600
> Isthari
> http://www.isthari.com
>

Mime
View raw message