jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christoph Kiehl (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-974) Manage Lucene FieldCaches per index segment
Date Wed, 20 Jun 2007 12:13:26 GMT

    [ https://issues.apache.org/jira/browse/JCR-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506506

Christoph Kiehl commented on JCR-974:

Regarding your ItemStateManagerBasedSortComparator.patch: This patch doesn't work well in
our scenario because we've got fairly large resultsets. I think your patch might handle small
result sets better than my patch, but for large result sets there are too many documents from
different index segments. Using your patch my query takes about 100000ms while using our patch
it needs between 200ms and 1000ms.

One of the other features of my patch is that it creates the caches lazily per index segment.
We also played around with a global term cache so if the same term is returned by different
index segments the same String object is used for the FieldCache. This minimizes the FieldCache
size if one term is contained in multiple index segments. In our case the default FieldCache
was about 4MB for a certain field while the patched FieldCache was about 2.5MB.

> Manage Lucene FieldCaches per index segment
> -------------------------------------------
>                 Key: JCR-974
>                 URL: https://issues.apache.org/jira/browse/JCR-974
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>         Attachments: ItemStateManagerBasedSortComparator.patch, patch.txt
> Jackrabbit uses an IndexSearcher which searches on a single IndexReader which is most
likely to be an instance of CachingMultiReader. On every search that does sorting or range
queries a FieldCache is populated and associated with this instance of a CachingMultiReader.
On successive queries which operate on this CachingMultiReader you will get a tremendous speedup
for queries which can reuse  those associated FieldCache instances.
> The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ one of the
underlying indexes are modified. This means if you just change _one_ item in the repository
you will need to rebuild all those FieldCaches because the existing FieldCaches are associated
with the old instance of CachingMultiReader.
> This does not only lead to slow search response times for queries which contains range
queries or are sorted by a field but also leads to massive memory consumption (depending on
the size of your indexes) because there might be multiple instances of CachingMultiReaders
in use if you have a scenario where a lot of queries and item modifications are executed concurrently.
> The goal is to keep those FieldCaches as long as possible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message