From dev-return-15333-apmail-jackrabbit-dev-archive=jackrabbit.apache.org@jackrabbit.apache.org Fri Nov 23 13:44:09 2007 Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 54751 invoked from network); 23 Nov 2007 13:44:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Nov 2007 13:44:09 -0000 Received: (qmail 44009 invoked by uid 500); 23 Nov 2007 13:43:55 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 43832 invoked by uid 500); 23 Nov 2007 13:43:55 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 43823 invoked by uid 99); 23 Nov 2007 13:43:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Nov 2007 05:43:55 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Nov 2007 13:44:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 27045714159 for ; Fri, 23 Nov 2007 05:43:43 -0800 (PST) Message-ID: <4546605.1195825423147.JavaMail.jira@brutus> Date: Fri, 23 Nov 2007 05:43:43 -0800 (PST) From: "Ard Schrijvers (JIRA)" To: dev@jackrabbit.apache.org Subject: [jira] Commented: (JCR-1213) UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader In-Reply-To: <4587580.1194865070552.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/JCR-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545025 ] Ard Schrijvers commented on JCR-1213: ------------------------------------- Aaaah yes, you are right. ATM I have a working test version, that seems to solve this issue, keep consecutive DescendantSelfAxisWeight/ChildAxisQuery queries fast when gc() has done its work (so the WeakReferences are correct now), and is also fast when incremental nodes are added/deleted from the index. To test the performance improvement, you need a *large* repository (for 1.000.000 nodes) where parent nodes are frequently found in different indexes. Then running queries like xpath = "//documents//*[@caption]" , where many nodes have this property will be much faster in consecutive runs. A query like "//documents//*[@date]" that has many common parents with @caption should run fast. The problem of inital 'slow' DescendantSelfAxisWeight/ChildAxisQuery keeps being a problem. OTOH, we might do some cache warming up if we start the repository. Since CachingIndexReader are kept during the live time of a persistent index, it might be quite useful. Here is also what I think is confusing about "http://jackrabbit.apache.org/doc/arch/operate/index-readers.html " : There it says: "A SharedIndexReader is kept open for the entire lifetime of a PersistentIndex" but AFAIU, the CachingIndexReader which is wrapped by SharedIndexReader is already kept for lifetime of a PersistentIndex, and the SharedIndexReader is merely kept for the lifetime of all running requests by reference counting. (If I am correct I can change the documentation slightly). Furthermore, I tested the impact of the step (1) --> step(2) check for the reference to the MultiIndexReader (if valid return docNumber instantly) or, when invalid but segment reader is valid, recompute docNumber. If I remove step(1) I see no performance change, therefore, will refactor to only have step(2), and I will always recompute the actual docNumber. I'll try to have a patch for testing ready today I have not invested "Thinking more about this issue it might be worth looking at an alternative. There is a DocNumberCache, which maps a UUID to a CachingIndexReader with a document number. This is exactly the information that is also present in a UUIDDocId. So we might just as well not cache the result in UUIDDocId but always use the DocNumberCache to resolve it. However I'm not sure how much overhead that adds. I'll have to investigate that first..." > UUIDDocId cache does not work properly because of weakReferences in combination with new instance for combined indexreader > --------------------------------------------------------------------------------------------------------------------------- > > Key: JCR-1213 > URL: https://issues.apache.org/jira/browse/JCR-1213 > Project: Jackrabbit > Issue Type: Improvement > Components: query > Affects Versions: 1.3.3 > Reporter: Ard Schrijvers > Fix For: 1.4 > > > Queries that use ChildAxisQuery or DescendantSelfAxisQuery make use of getParent() functions to know wether the parents are correct and if the result is allowed. The getParent() is called recursively for every hit, and can become very expensive. Hence, in DocId.UUIDDocId, the parents are cached. > Currently, docId.UUIDDocId's are cached by having a WeakRefence to the CombinedIndexReader, but, this CombinedIndexReader is recreated all the time, implying that a gc() is allowed to remove the 'expensive' cache. > A much better solution is to not have a weakReference to the CombinedIndexReader, but to a reference of each indexreader segment. This means, that in getParent(int n) in SearchIndex the return > return id.getDocumentNumber(this) needs to be replaced by return id.getDocumentNumber(subReaders[i]); and something similar in CachingMultiReader. > That is all. Obviously, when a node/property is added/removed/changed, some parts of the cached DocId.UUIDDocId will be invalid, but mainly small indexes are updated frequently, which obviously are less expensive to recompute. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.