jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader
Date Mon, 19 Nov 2007 10:55:43 GMT

    [ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543520

Ard Schrijvers commented on JCR-1213:

"I think whatever UUIDDocId calculates should be independent of the multi index reader. That
is, it should only hold the document number as retrieved from the index segment. Then in a
second step an offset should be applied"

Yes, this is probably the cleanest way. I now also understand why we had the discussion about
how to solve the issue. You were already thinking about computing the docNumber in a second
step, hence, all that matters is the segment instance. 

So, the latter part, about the segment instance I did build, though we might discuss wether
it is a good way. I added a method to MultiIndexReader interface, 

public boolean hasIndexReaderInstance(IndexReader indexReader);

and in CachingMultiReader and CombinedIndexReader I keep track of subreaders instances with
an IdentityHashMap().

In UUIDDocId I can find the reader instance the doc was found in by changing SingleTermDocs
by having a reference to its segment reader. Obviously, now I have to cast reader.termDocs(id)
to SingleTermDocs which we might not like.

Anyway, I'll try to add the second step offset in calculating the docNumber as you suggested
somewhere this week, and create a patch (might be easier than talking about a solution). 

> UUIDDocId cache does not work properly because of weakReferences in combination with
new instance for combined indexreader 
> ---------------------------------------------------------------------------------------------------------------------------
>                 Key: JCR-1213
>                 URL: https://issues.apache.org/jira/browse/JCR-1213
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.3
>            Reporter: Ard Schrijvers
>             Fix For: 1.4
> Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions
to know wether the parents are correct and if the result is allowed. The getParent() is called
recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents
are cached. 
> Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader,
but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to
remove the 'expensive' cache.
> A much better solution is to not have a weakReference to the CombinedIndexReader, but
to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex
the return 
> return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]);
and something similar in CachingMultiReader. 
> That is all. Obviously, when a node/property is added/removed/changed, some parts of
the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently,
which obviously are less expensive to recompute.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message