jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-974) Manage Lucene FieldCaches per index segment
Date Fri, 22 Jun 2007 07:45:26 GMT

    [ https://issues.apache.org/jira/browse/JCR-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507153
] 

Marcel Reutegger commented on JCR-974:
--------------------------------------

Thanks a lot for the revised patch. I was just going to write some comments about your previous
version and now most of my concerns are already addressed ;)

One of the concerns I had was regarding lazy loading, because it would have required synchronization
on the map (which was missing in the previous patch).

I'm using a fairly simple test case to measure performance. It involves creating 200'000 nodes
with a LONG property set to distinct values and then executing the following query:

stuff//*[@foo] order by @foo

On average the execution time with the current codebase is 2200ms, with your initial patch:
265ms and with the latest patch: 235ms.

Can you please make sure you consistently use space characters instead of tabs in the source
code? Thanks.

Now, there's just one thing left. You introduced a cache for the ReadOnlyIndexReaders in AbstractIndex.
I'd rather not want to have a cache there because it means that we have to maintain it. In
your patch the map is cleaned when the index is invalidated. For older index segments (the
bigger ones, which resulted from index merges) this means that the map is only cleaned when
the index segment is closed (when it is part of a merge or on shutdown). IMO this is somewhat
of a memory leak and should be changed.

I would rather have these three lines at the beginning of the method SharedFieldCache.getStringIndex():

        if (reader instanceof ReadOnlyIndexReader) {
            reader = ((ReadOnlyIndexReader) reader).getBase();
        }

It doesn't win a price for beauty but has the same effect as the cache.

WDYT?

> Manage Lucene FieldCaches per index segment
> -------------------------------------------
>
>                 Key: JCR-974
>                 URL: https://issues.apache.org/jira/browse/JCR-974
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3
>            Reporter: Christoph Kiehl
>         Attachments: ItemStateManagerBasedSortComparator.patch, patch.txt, patch2.txt
>
>
> Jackrabbit uses an IndexSearcher which searches on a single IndexReader which is most
likely to be an instance of CachingMultiReader. On every search that does sorting or range
queries a FieldCache is populated and associated with this instance of a CachingMultiReader.
On successive queries which operate on this CachingMultiReader you will get a tremendous speedup
for queries which can reuse  those associated FieldCache instances.
> The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ one of the
underlying indexes are modified. This means if you just change _one_ item in the repository
you will need to rebuild all those FieldCaches because the existing FieldCaches are associated
with the old instance of CachingMultiReader.
> This does not only lead to slow search response times for queries which contains range
queries or are sorted by a field but also leads to massive memory consumption (depending on
the size of your indexes) because there might be multiple instances of CachingMultiReaders
in use if you have a scenario where a lot of queries and item modifications are executed concurrently.
> The goal is to keep those FieldCaches as long as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message