jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Re: Slow full text query performance and Lucene Index handling in Oak
Date Wed, 09 Apr 2014 15:25:09 GMT
Current update

1. Tommaso provided a patch (OAK-1702) to disable compression and that
also helps quite a bit
2. Currently we are storing the full tokenized text in Lucene Index
[1]. This would cause fetching of doc fields to be slower. On
disabling the storage the number improve quite a bit. This was added
as part of OAK-319 for supporting MLT

# FullTextSearchTest               C     min     10%     50%     90%
  max       N
Oak-Tar (codec)                    1       9       9      10      12
   41    5664
Oak-Tar (codec,mlt off)            1       7       8       8      10
   21    6921

Would look further

Chetan Mehrotra
[1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java#L44

On Wed, Apr 9, 2014 at 7:15 PM, Alex Parvulescu
<alex.parvulescu@gmail.com> wrote:
> Aside from the compression issue, there was another one related to the
> 'order by' clause. I saw Collections.sort taking up as far as 23% of the
> perf.
>
> I removed the order by temporarily so it doesn't get in the way of the
> Lucene stuff, but I think the QueryEngine should skip ordering results in
> this case.
>
>
>
>
> On Wed, Apr 9, 2014 at 3:31 PM, Tommaso Teofili
> <tommaso.teofili@gmail.com>wrote:
>
>> I'm looking into the Lucene codecs right now.
>>
>> Tommaso
>>
>>
>> 2014-04-09 15:20 GMT+02:00 Alex Parvulescu <alex.parvulescu@gmail.com>:
>>
>> > Profiling the result shows that quite a bit of time goes in
>> > org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
>> > think is part of Lucene 4.x and not present in 3.x. Any idea if I can
>> > disable compression?
>> >
>> > +1 I noticed that too, we should try to disable compression and compare
>> > results.
>> >
>> > alex
>> >
>> >
>> > On Wed, Apr 9, 2014 at 3:16 PM, Chetan Mehrotra
>> > <chetan.mehrotra@gmail.com>wrote:
>> >
>> > > On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting <jukka.zitting@gmail.com
>> >
>> > > wrote:
>> > > > Is that a common use case? To better simulate a normal usage scenario
>> > > > I'd make the benchmark fetch up to N results (where N is
>> configurable,
>> > > > with default something like 20) and access the path and the title
>> > > > property of the matching nodes.
>> > >
>> > > I changed the logic of benchmark in http://svn.apache.org/r1585962.
>> > > With that JR2 slows down a bit
>> > >
>> > > # FullTextSearchTest               C     min     10%     50%     90%
>> > >   max       N
>> > > Oak-Tar                            1      34      35      36      39
>> > >    60    1639
>> > > Jackrabbit                         1       5       5       6       7
>> > >    68   10038
>> > >
>> > > Profiling the result shows that quite a bit of time goes in
>> > > org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
>> > > think is part of Lucene 4.x and not present in 3.x. Any idea if I can
>> > > disable compression?
>> > >
>> > > Chetan Mehrotra
>> > >
>> >
>>

Mime
View raw message