jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Slow full text query performance and Lucene Index handling in Oak
Date Tue, 08 Apr 2014 15:51:55 GMT
Hi,

As part of OAK-1702 I have added a benchmark to compare the
performance of Full text query search with JR2

Based on approach taken (which might be wrong) I get following numbers

Apache Jackrabbit Oak 0.21.0-SNAPSHOT
# FullTextSearchTest               C     min     10%     50%     90%
  max       N
Oak-Mongo                          1      58      71     101     119
  287     610
Oak-Mongo-FDS                      1      50      51      52      58
  184    1106
Oak-Tar                            1      39      40      40      44
   64    1459
Oak-Tar-FDS                        1      53      54      55      64
  197    1030
Jackrabbit                         1       4       4       5       6
  231   11385

Which shows that JR2 performs lot better for full text queries and
subsequent queries are quite faster once Lucene has warmed up.

Looking at current usage of Lucene in Oak and the way we store and
access the Lucene indexes [2] I have couple of doubts

1. Multiple IndexSearcher instances - Current impl would create a new
IndexSearcher for every Lucene query as the OakDirectory uses is bound
to NodeState of executing JCR session. Compared to this in JR2 we
probably had a singleton IndexSearcher which was shared across all the
query execution path. This would potentially cause performance issue
as Lucene is effectively used in a state less way and it has to
perform initialization for every call. As [3] the IndexSearcher must
be shared

2. Index Access - Currently we have custom OakDirectory which provides
access to Lucene indexes stored in NodeStore. Even with SegmentStore
which has memory mapped file the random access used by Lucene would
probably be lot slower with OakDirectory in comparison to default
Lucene MMapDirectory. For small setups where Lucene index can be
accomodated on each node I think it would be better if the index is
access from file system

Are the above concerns valid and should we relook into how we are
using Lucene in Oak?

Chetan Mehrotra
[1] https://issues.apache.org/jira/browse/OAK-1702
[2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java
[3] http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Mime
View raw message