jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader
Date Wed, 14 Nov 2007 10:41:43 GMT

    [ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542406

Marcel Reutegger commented on JCR-1213:

Ard wrote:
> Am thinking about a two step check, where first a reference to the entire
> MultiIndexReader is checked.
> IF (1): check reference to the entire MultiIndexReader instance is positive,
> return cached results. ELSE IF (2) :check the index reader segment instance the
> parent docnumber was in: if instance present, recompute docNumber with
> respect to the new offsets in MultiIndexReader and return (almost) cached
> result. ELSE (3): recompute docNumber by search in MultiIndexReader (the
> uncached case)

I would rather make it simple and just use one approach. The first check (1) is the reason
why you created this issue. I think we should therefore remove this check and replace it with
something that works better. The additional check (2) is not necessary IMO because we have
DocId.applyOffset(), though we will have to modify the signature because the code currently
assumes that the returned DocId is tied to the index segment it originated from. Thus the
offset for that segment is passed at applyOffset(). Because the UUIDDocId also relates to
another index segment (parent points to another segment) the applyOffset() in its current
form is useless for the UUIDDocId implementation. Maybe we should pass in the complete information.
All readers and their offsets. This allows the UUIDDocId to apply an offset properly. As for
(3), this is again part of the issue we should try to solve, thus we should not relate a UUIDDocId
to the MultiIndexReader but to a index segment/reader within

I'll also start with a diagram explaining the various readers and how they relate to each

> UUIDDocId cache does not work properly because of weakReferences in combination with
new instance for combined indexreader 
> ---------------------------------------------------------------------------------------------------------------------------
>                 Key: JCR-1213
>                 URL: https://issues.apache.org/jira/browse/JCR-1213
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.3
>            Reporter: Ard Schrijvers
>             Fix For: 1.4
> Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions
to know wether the parents are correct and if the result is allowed. The getParent() is called
recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents
are cached. 
> Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader,
but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to
remove the 'expensive' cache.
> A much better solution is to not have a weakReference to the CombinedIndexReader, but
to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex
the return 
> return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]);
and something similar in CachingMultiReader. 
> That is all. Obviously, when a node/property is added/removed/changed, some parts of
the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently,
which obviously are less expensive to recompute.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message