jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Paths in Oak
Date Fri, 01 Nov 2013 12:43:30 GMT

On Fri, Nov 1, 2013 at 1:18 AM, Tobias Bocanegra <tripod@apache.org> wrote:
> I debugged a simple call like:
> session.getProperty("/a/b/foo").getString();
> and was really astonished how many path conversion, checking,
> manipulation, tree/parent/child access operations are performed. not
> only in access control, but everywhere. it looks like most of the
> time, oak is busy converting, checking, truncating string paths :-)

Indeed! This has been a recurring topic, see
https://issues.apache.org/jira/browse/OAK-978 for the latest version
of that debate (and https://issues.apache.org/jira/browse/OAK-1015 for

> From stepping though the code, it looks like the full path is not used
> very often, but rather the individual segments. maybe it would make
> sense to re-use jackrabbits Name and Path classes.

One of the performance/memory issues I spent a lot of time on with
Jackrabbit was optimizing the internal Name and Path classes and I'm
still not happy with the overhead of all the parsing/serialization and
extra objects we need there.

In Oak it has been an explicit goal to avoid those conversions for the
common case where no namespace remappings are present. Then in most
cases it should be possible to just keep the original path string
passed by the client and use String.substring() for the individual
path segments.

To make this work really smoothly and efficiently, we need to separate
the tasks of path/name mapping (i.e. namespace prefixes) from
path/name validation (checking whether names are valid). Unfortunately
those tasks are currently mixed in NamePathMapper (which is what
you're seeing), and we haven't yet taken on the non-trivial effort of
untangling them. Perhaps we should try now.

> Further, getProperty().getString() actually fetches the property twice
> and also checks access control twice. once in the getProperty() call,
> and once when fetching the value. I would assume, that the value is
> already stored in the PropertyState after the getProperty() call.

Some while ago we'd actually fetch the property a dozen times (see
http://markmail.org/message/v2ydm5xksd25glm4), so twice is already a
major improvement... Though of course you're right that there should
only need to be a single PropertyState lookup for that pattern.

That's the kind of unnecessary work I was referring to in my comment
to OAK-1138. Before spending too much effort optimizing the lower
levels of code, we should ensure that higher levels of code isn't
causing duplicate or unnecessary work to be performed. Not doing
something at all is always faster than trying to optimize it.


Jukka Zitting

View raw message