Return-Path: X-Original-To: apmail-jackrabbit-dev-archive@www.apache.org Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37A4EF919 for ; Thu, 4 Apr 2013 10:15:16 +0000 (UTC) Received: (qmail 82074 invoked by uid 500); 4 Apr 2013 10:15:16 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 81930 invoked by uid 500); 4 Apr 2013 10:15:15 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 81900 invoked by uid 99); 4 Apr 2013 10:15:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 10:15:15 +0000 Date: Thu, 4 Apr 2013 10:15:15 +0000 (UTC) From: "Lukas Eder (JIRA)" To: dev@jackrabbit.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (JCR-3557) Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/JCR-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621996#comment-13621996 ] Lukas Eder commented on JCR-3557: --------------------------------- ... note, I'm aware of some implementations (including the one from the screenshot) using NODE_ID_HI and NODE_ID_LO, which makes using the "IN predicate" a bit more tricky. Some databases support tuples, e.g. (NODE_ID_HI, NODE_ID_LO) IN ((?, ?), (?, ?), (..., ...), ...) Others would have to resort to "expanding" the tuples: (NODE_ID_HI = ? AND NODE_ID_LO = ?) OR (NODE_ID_HI = ? AND NODE_ID_LO = ?) OR (...) Besides, there are lots of other limits to consider: - Oracle's IN predicate can only contain 1000 elements - SQL Server and Sybase have a limit of around 2000 bind variables per query See also: http://stackoverflow.com/q/11266609/521799 > Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager > ----------------------------------------------------------------------------- > > Key: JCR-3557 > URL: https://issues.apache.org/jira/browse/JCR-3557 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-core > Affects Versions: 2.6 > Reporter: Lukas Eder > Priority: Minor > Attachments: Yourkit_Screenshot_SQL.png, Yourkit_Screenshot_Stacktrace.png > > > Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL queries being executed when using the BundleDbPersistenceManager. Please consider the attached Screenshots from Yourkit Profiler, which illustrate how saving the deletion of a single node through the JCR API results in 34 SQL DELETE statements for my test case: > - Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion. Please disregard the absolute values of the time column. It is biased by the profiling session. > - Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by the selected stack subtree > While other operations are probably not very optimal either, this one is a very low hanging fruit, in my opinion. Consider the following piece of code in AbstractBundlePersistenceManager.storeInternal(): > {code} > for (ItemState state : changeLog.deletedStates()) { > if (state.isNode()) { > NodePropBundle bundle = getBundle((NodeId) state.getId()); > if (bundle == null) { > throw new NoSuchItemStateException(state.getId().toString()); > } > deleteBundle(bundle); > deleted.add(state.getId()); > } > } > {code} > Now, instead of iterating over deletedStates and loading / deleting them one by one, they should be bulk-loaded / bulk-deleted as such: > {code} > List nodeIds = new ArrayList(); > for (ItemState state : changeLog.deletedStates()) { > if (state.isNode()) { > nodeIds.add((NodeId) state.getId()); > } > } > List bundles = new ArrayList(); > bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException > deleteBundles(bundles); > deleted.addAll(nodeIds); > {code} > For backwards-compatibility and convenience, AbstractBundlePersistenceManager would provide trivial default implementations for getBundles() and deleteBundles(): > {code} > protected List getBundles(Collection nodeIds) throws ItemStateException { > List result = new ArrayList(); > for (NodeId nodeId : nodeIds) { > result.add(getBundle(nodeId)); > } > return result; > } > protected void deleteBundles(Collection bundles) throws ItemStateException { > for (NodePropBundle bundle : bundles) { > deleteBundle(bundle); > } > } > {code} > But the BundleDbPersistenceManager could override this default behaviour by executing something like > {code} > -- getBundles() > select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...) > -- deleteBundles() > delete from BUNDLE where NODE_ID in (?, ?, ...) > {code} > The same also applies for deleting from the REFS table. > If this seems like a viable improvement and if I didn't overlook something subtle where one-by-one selection / deletion of bundle data is important, I could go ahead and provide a patch for this issue. Once such an improvement is in place, other bundle persistence managers might take advantage of the new algorithm as well. Also, we could review adding / updating bundles in another JIRA ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira