jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergiy Shyrkov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (JCR-3572) Optimization in NodeImpl.getNodes(String) and getNodes(String[])
Date Wed, 17 Apr 2013 10:37:15 GMT
Sergiy Shyrkov created JCR-3572:
-----------------------------------

             Summary: Optimization in NodeImpl.getNodes(String) and getNodes(String[])
                 Key: JCR-3572
                 URL: https://issues.apache.org/jira/browse/JCR-3572
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
    Affects Versions: 2.6, 2.5.3, 2.4.3, 2.2.13
            Reporter: Sergiy Shyrkov
            Priority: Minor


We were doing some refactoring/optimization of our code (we are using Jackrabbit 2.2.x code)
and replacing in some places traversal over child nodes (node.getNodes()) with node.getNodes(namePattern)
as it suppose to speed up the things by only returning children, matching particular name
pattern (in our case those our internationalization nodes, starting with "translation-").

Profiling the code after we noticed that node.getNodes(namePattern) is actually slower than
simple traversal over all children checking their name match.

We can down to the org.apache.jackrabbit.util.ChildrenCollectorFilter which is used in the
NodeImpl.getNodes(String) and getNodes(String[])

Even if the instance of the filter is created with collectProperties set to false the parent
(javax.jcr.util.TraversingItemVisitor.visit(Node)) still reads the properties of the nodes
and iterates over them which seems useless.

Digging deeper it seems there is an optimization which can be done to getNodes(String) and
getNodes(String[]) earlier.

Those methods could call the itemMgr.getChildNodes(), like the getNodes() does, passing additionally
namePattern / nameGlobs.
The ItemManager.ChildNodeEntry() already iterates on ChildNodeEntry instances, which have
Name, so the filtering can be done here, i.e. much earlier, than in the LazyItemIterator.prefetchNext().

The adjusted code for org.apache.jackrabbit.core.ItemManager.getChildNodes(), we've tested
on our side looks as follows:

    synchronized NodeIterator getChildNodes(NodeId parentId)
            throws ItemNotFoundException, AccessDeniedException, RepositoryException {
        return getChildNodesInternal(parentId, null, null);
    }

    synchronized NodeIterator getChildNodes(NodeId parentId, String namePattern, String[]
nameGlobs)
            throws ItemNotFoundException, AccessDeniedException, RepositoryException {
        return getChildNodesInternal(parentId, namePattern, nameGlobs);
    }

    private NodeIterator getChildNodesInternal(NodeId parentId, String namePattern, String[]
nameGlobs)
            throws ItemNotFoundException, AccessDeniedException, RepositoryException {
        sanityCheck();

        ItemData data = getItemData(parentId);
        if (!data.isNode()) {
            String msg = "can't list child nodes of property " + parentId;
            log.debug(msg);
            throw new RepositoryException(msg);
        }
        ArrayList<ItemId> childIds = new ArrayList<ItemId>();
        Iterator<ChildNodeEntry> iter = ((NodeState) data.getState()).getChildNodeEntries().iterator();

        while (iter.hasNext()) {
            ChildNodeEntry entry = iter.next();
            // delay check for read-access until item is being built
            // thus avoid duplicate check
            
            // check for name
            if (namePattern == null && nameGlobs == null ||
                    namePattern != null
                    && ChildrenCollectorFilter.matches(sessionContext.getJCRName(entry.getName()),
namePattern)
                    || nameGlobs != null
                    && ChildrenCollectorFilter.matches(sessionContext.getJCRName(entry.getName()),
nameGlobs)) {
                childIds.add(entry.getId());
            }
        }

        return new LazyItemIterator(sessionContext, childIds, parentId);
    }


and in the NodeImpl we call ItemManager.getChildNodes() as follows:


    public NodeIterator getNodes(String namePattern) throws RepositoryException {
        return getNodesInternal(namePattern, null);
    }

    public NodeIterator getNodes(final String[] nameGlobs)
            throws RepositoryException {
        return getNodesInternal(null, nameGlobs);
    }

    private NodeIterator getNodesInternal(final String namePattern, final String[] nameGlobs)
throws RepositoryException {
        return perform(new SessionOperation<NodeIterator>() {
            public NodeIterator perform(SessionContext context)
                    throws RepositoryException {
                try {
                    return itemMgr.getChildNodes((NodeId) id, namePattern, nameGlobs);
                } catch (ItemNotFoundException e) {
                    throw new RepositoryException(
                            "Failed to list child nodes of " + NodeImpl.this, e);
                } catch (AccessDeniedException e) {
                    throw new RepositoryException(
                            "Failed to list child nodes of " + NodeImpl.this, e);
                }
            }
            public String toString() {
                return "node.getNodes()";
            }
        });
    }



Could you please provide your feedback if the optimization makes sense and looks consistent
or if there are any issues with those changes?

I could provide the patch of suggested changes. I just need to know against which branch of
Jackrabbit code it should be done.

Thank you in advance!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message