jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader
Date Wed, 21 Nov 2007 10:12:43 GMT

    [ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544401
] 

Marcel Reutegger commented on JCR-1213:
---------------------------------------

I recently added some documentation to the website about the index readers:

http://jackrabbit.apache.org/doc/arch/operate/index-readers.html

> to be honest, I cannot yet grasp the big picture about keeping track of the deleted bitset

The new documentation shows how and when the deleted bit set for the ReadOnlyIndexReader is
created.

The ReadOnlyIndexReaders are indeed constructed on every change. That's very unfortunate and
should be changed. I'll create an issue for that. While this will fix the case where an ReadOnlyIndexReader
is re-constructed even though nothing changed in that segment, we will still have the issue
that a new ReadOnlyIndexReader is constructed if a node is deleted in that segment. Even in
that case we don't want to re-calculate all the UUIDDocIds that point to this segment.

> So, instead of using a WeakReference on the multiReader segments, I could get the sharedReader
instance out of it

Yes, that's probably the only way how to keep the UUIDDocIds valid as long as possible. A
chose a similar approach in CachingMultiReader.termDocs(Term). The relation between the shared
reader and the read only reader is held in readersByBase. But that's quite ugly.

Thinking more about this issue it might be worth looking at an alternative. There is a DocNumberCache,
which maps a UUID to a CachingIndexReader with a document number. This is exactly the information
that is also present in a UUIDDocId. So we might just as well not cache the result in UUIDDocId
but always use the DocNumberCache to resolve it. However I'm not sure how much overhead that
adds. I'll have to investigate that first...

> UUIDDocId cache does not work properly because of weakReferences in combination with
new instance for combined indexreader 
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: JCR-1213
>                 URL: https://issues.apache.org/jira/browse/JCR-1213
>             Project: Jackrabbit
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.3
>            Reporter: Ard Schrijvers
>             Fix For: 1.4
>
>
> Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions
to know wether the parents are correct and if the result is allowed. The getParent() is called
recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents
are cached. 
> Currently,  docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader,
but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to
remove the 'expensive' cache.
> A much better solution is to not have a weakReference to the CombinedIndexReader, but
to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex
the return 
> return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]);
and something similar in CachingMultiReader. 
> That is all. Obviously, when a node/property is added/removed/changed, some parts of
the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently,
which obviously are less expensive to recompute.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message