jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Eder (JIRA)" <j...@apache.org>
Subject [jira] [Created] (JCR-3557) Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager
Date Thu, 04 Apr 2013 09:41:15 GMT
Lukas Eder created JCR-3557:

             Summary: Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager
                 Key: JCR-3557
                 URL: https://issues.apache.org/jira/browse/JCR-3557
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core
    Affects Versions: 2.6
            Reporter: Lukas Eder
            Priority: Minor
         Attachments: Yourkit_Screenshot_SQL.png, Yourkit_Screenshot_Stacktrace.png

Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL queries being
executed when using the BundleDbPersistenceManager. Please consider the attached Screenshots
from Yourkit Profiler, which illustrate how saving the deletion of a single node through the
JCR API results in 34 SQL DELETE statements for my test case:

- Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion. Please disregard
the absolute values of the time column. It is biased by the profiling session.
- Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by the selected stack

While other operations are probably not very optimal either, this one is a very low hanging
fruit, in my opinion. Consider the following piece of code in AbstractBundlePersistenceManager.storeInternal():

for (ItemState state : changeLog.deletedStates()) {
    if (state.isNode()) {
        NodePropBundle bundle = getBundle((NodeId) state.getId());
        if (bundle == null) {
            throw new NoSuchItemStateException(state.getId().toString());

Now, instead of iterating over deletedStates and loading / deleting them one by one, they
should be bulk-loaded / bulk-deleted as such:

List<NodeId> nodeIds = new ArrayList<NodeId>();
for (ItemState state : changeLog.deletedStates()) {
    if (state.isNode()) {
        nodeIds.add((NodeId) state.getId());
List<NodePropBundle> bundles = new ArrayList<NodePropBundle>();
bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException

For backwards-compatibility and convenience, AbstractBundlePersistenceManager would provide
trivial default implementations for getBundles() and deleteBundles():

protected List<NodePropBundle> getBundles(Collection<? extends NodeId> nodeIds)
throws ItemStateException {
    List<NodePropBundle> result = new ArrayList<NodePropBundle>();

    for (NodeId nodeId : nodeIds) {

    return result;

protected void deleteBundles(Collection<? extends NodePropBundle> bundles) throws ItemStateException
    for (NodePropBundle bundle : bundles) {

But the BundleDbPersistenceManager could override this default behaviour by executing something

-- getBundles()
select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...)

-- deleteBundles()
delete from BUNDLE where NODE_ID in (?, ?, ...)

The same also applies for deleting from the REFS table.

If this seems like a viable improvement and if I didn't overlook something subtle where one-by-one
selection / deletion of bundle data is important, I could go ahead and provide a patch for
this issue. Once such an improvement is in place, other bundle persistence managers might
take advantage of the new algorithm as well. Also, we could review adding / updating bundles
in another JIRA ticket.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message