jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Re (OAK-36) Implement a query parser - what about indexing?
Date Mon, 26 Mar 2012 10:56:56 GMT
Hi,

There's a number of points in this thread that I wanted to address, so
instead of replying to them individually, let me try to summarize my
thinking.

One of the bigger pain points in the Jackrabbit 2.x architecture has
been the query engine and the workspace-global query index that has
been pretty difficult to customize for special needs and to handle in
terms of backup/recovery and scaling to multiple cluster nodes. My
wish for Oak is that we come up with a much more flexible search and
indexing architecture that solves these issues and is easy to extend
for any future use cases we may encounter.

I think the biggest issue, as brought up by Alex and then elaborated
by Ard, is the way we handle indexing. Instead of having a single,
more or less fixed index for a repository like in Jackrabbit 2.x, Oak
should provide generic extension points that various different kinds
of indexing components could hook into. We should have at least three
such extension points: pre- and post-commit hooks, and observation
based on the commit journal.

For example a low-level UUID-to-path index should preferably use the
pre-commit hook for atomic index updates as a part of each commit. A
post-commit hook could be used to trigger full-text extraction of
nt:file binaries, a bit like we currently do in Jackrabbit 2.x. And an
observation client could use the commit journal to feed an external
Solr index for application-level index features. A given deployment
can choose which ones of these and any other indexing components are
needed based on relevant application needs and related
performance/scalability overhead. A single solution does not fit all
needs, so we need to make such customization as easy as possible.

On the other hand there's a lot of value in having a single, unified
query abstraction instead of having client applications reach out
directly to Solr, Lucene, or custom indexes. Thus, in addition to the
extensions points for indexing, we need a way for the indexing
components to extend the Oak query engine with ways to evaluate given
queries against the various configured indexes. This way all
applications can use the same generic Oak query API (exposed through
QueryManager in JCR, DASL in WebDAV, and/or something else in JSOP)
while leveraging the custom indexes available in each deployment.

BR,

Jukka Zitting

Mime
View raw message