jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: On custom index configuration
Date Wed, 19 Sep 2012 09:03:08 GMT
Hi,

On Wed, Sep 19, 2012 at 8:56 AM, Thomas Mueller <mueller@adobe.com> wrote:
>>At query time, when it knows the main path constraint used in the
>>query, it can walk down that path to detect which indexes are
>>available and useful for resolving the query.
>
> I guess we could make it work. It would make the query engine a bit more
> complex, and some of the queries would get a little bit slower (because a
> few more nodes would need to be read as they might contain index configs),
> but it's possible as far as I see.

The performance difference should be minimal, as all the relevant
index configuration nodes will be frequently accessed and thus cached
in memory. If there is a significant performance difference to
accessing another in-memory data structure, then we have a bug in our
cache.

As for complexity, we also gain from not having to maintain a separate
up to date in-memory representation of the index configuration and
worry about keeping it in sync with changes in in-content
configuration.

> The configuration of 'global' indexes (that affect the whole repository,
> such as the jcr:uuid index, the fulltext index) would still need to be
> stored at a fixed location (for example at the root node).

Yes, the root node can (and should) be an oak:indexed node.

> One problem is if the index config is stored at the wrong place, or if the
> query doesn't include the path restriction. For example if a config of a
> global index is stored under "/content" instead of "/", and then if the
> query doesn't explicitly use "/content", the index wouldn't be picked up.

That would be as designed. If you want to speed up a query that
doesn't contain a path restriction, you'd need to put the index under
the root node.

> Storing the index configs at a fixed location is still what I would
> prefer, because it is a very simple solution, and I still don't see very
> big advantages to store the config near the content.

In addition to the access control issue I mentioned earlier this would
also allow us to migrate custom search indexes along with content. For
example, consider a web site or another content application stored in
a subtree of one repository. If you want to migrate it to another
repository (for example from development to production), it'll be
trivially easy to include also any custom indexes if they're
configured and stored in the same subtree.

On Wed, Sep 19, 2012 at 10:06 AM, Thomas Mueller <mueller@adobe.com> wrote:
> There is one more problem with storing the index config near the content.
> The index config doesn't just need to be read when running a query, but
> also when modifying data, in order to update the index data itself. If the
> index config isn't stored at a central place, then either the index isn't
> updated, or each time you store anything, all the parent nodes need to be
> read to pick up index configs.

The commit hook mechanism already provides a natural mechanism for
picking up information as you traverse down the tree to those areas
that are modified in a commit.

> A variation would be to store the index config at two places (at a central
> location and near the context). An internal observation handler could
> synchronize the two.

I'd really like to avoid the need to rely on observation for keeping
internal data structures in sync. It adds quite a bit of complexity
and risks hard-to-track inconsistency in internal state if there's a
bug in the relevant code.

> So I suggest we start with storing the index config at a central location,
> and then if we see a strong need we can still support a different solution.

We can start by only supporting the oak:indexed mechanism at the root
node, and extending it to support subtrees once there's a strong
enough need for that.

BR,

Jukka Zitting

Mime
View raw message