jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader
Date Wed, 14 Nov 2007 11:13:43 GMT

    [ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542418

Ard Schrijvers commented on JCR-1213:

"The first check (1) is the reason why you created this issue" 

Not entirely: currently, the CombinedIndexReader instance is used as a WeakReference, and
this one is recreated for every search. The MultiIndexReader instance is kept AFAICS as long
as all indexes are the same. So, in SearchIndex, changing 

public int getParent(int n) throws IOException {
            return id.getDocumentNumber(this);


public int getParent(int n) throws IOException {
            return id.getDocumentNumber(subReaders[i]);

would already implement (1).  This one holds when *every* index reader instance is the same.

If, one of the instances has changed, we would need step (2) IIUC.  Then we could check wether
the instance the parent was found in is still valid, and, as you indicate, should return the
'corrected' DocNumber, which might be different due to applyOffSet. When (1) and (2) are both
invalid, then the search for the parent node in subReaders[i] should be done again. 

I agree, that (1) is redundant because (2) captures (1) , but I added it, because first of
all, it is something we can add right away, and secondly, because I think (but I should measure)
that if the subReaders[i] instance (MultiIndexReader) did not change, it is useless to do
a lookup of the index reader segment the parent was in and check wether the instance is still

I do agree with you that if removing (1) does not imply any performance loss, we should only
go for (2). But it is not correct that (1) does not solve anything to the original problem:
instead of the CombinedIndexReader which is recreated all the time, I pass in the MultiIndexReader
whose instance is kept as long as no indexes change. This is at least what I understand from
the mechanism, but I am not as familiar as you are with it ofcourse, so I might be off. 

> UUIDDocId cache does not work properly because of weakReferences in combination with
new instance for combined indexreader 
> ---------------------------------------------------------------------------------------------------------------------------
>                 Key: JCR-1213
>                 URL: https://issues.apache.org/jira/browse/JCR-1213
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.3
>            Reporter: Ard Schrijvers
>             Fix For: 1.4
> Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions
to know wether the parents are correct and if the result is allowed. The getParent() is called
recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents
are cached. 
> Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader,
but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to
remove the 'expensive' cache.
> A much better solution is to not have a weakReference to the CombinedIndexReader, but
to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex
the return 
> return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]);
and something similar in CachingMultiReader. 
> That is all. Obviously, when a node/property is added/removed/changed, some parts of
the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently,
which obviously are less expensive to recompute.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message