jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Re (OAK-36) Implement a query parser - what about indexing?
Date Mon, 26 Mar 2012 11:14:58 GMT
On Mon, Mar 26, 2012 at 12:56 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
>
> There's a number of points in this thread that I wanted to address, so
> instead of replying to them individually, let me try to summarize my
> thinking.
>
> One of the bigger pain points in the Jackrabbit 2.x architecture has
> been the query engine and the workspace-global query index that has
> been pretty difficult to customize for special needs and to handle in
> terms of backup/recovery and scaling to multiple cluster nodes. My
> wish for Oak is that we come up with a much more flexible search and
> indexing architecture that solves these issues and is easy to extend
> for any future use cases we may encounter.
>
> I think the biggest issue, as brought up by Alex and then elaborated
> by Ard, is the way we handle indexing. Instead of having a single,
> more or less fixed index for a repository like in Jackrabbit 2.x, Oak
> should provide generic extension points that various different kinds
> of indexing components could hook into. We should have at least three
> such extension points: pre- and post-commit hooks, and observation
> based on the commit journal.
>
> For example a low-level UUID-to-path index should preferably use the
> pre-commit hook for atomic index updates as a part of each commit. A
> post-commit hook could be used to trigger full-text extraction of
> nt:file binaries, a bit like we currently do in Jackrabbit 2.x. And an
> observation client could use the commit journal to feed an external
> Solr index for application-level index features. A given deployment
> can choose which ones of these and any other indexing components are
> needed based on relevant application needs and related
> performance/scalability overhead. A single solution does not fit all
> needs, so we need to make such customization as easy as possible.
>
> On the other hand there's a lot of value in having a single, unified
> query abstraction instead of having client applications reach out
> directly to Solr, Lucene, or custom indexes. Thus, in addition to the
> extensions points for indexing, we need a way for the indexing
> components to extend the Oak query engine with ways to evaluate given
> queries against the various configured indexes. This way all
> applications can use the same generic Oak query API (exposed through
> QueryManager in JCR, DASL in WebDAV, and/or something else in JSOP)
> while leveraging the custom indexes available in each deployment.

Thanks for this summary. I now really understand what the goals are
and how to achieve it. Especially the unified generic Oak query API is
something I really like. Currently, for Hippo, I am doing something
similar for the query api, that can seamlessly delegate to Solr or
jackrabbit, both returning a jcr node iterator (although the solr
index through solrj can also return plain pojo's). I really like the
first option (pre-commit example) and third (observation based), and
still see many bears on the road for the second (full-text on
post-commit)

I've one more question regarding the oak search/indexes : Will we be
able to query that returns something else than jcr nodes/rows? I
frequently want to be able to get a query result from the repository
that cannot be returned as node iterators. For example query on stats,
or a query for 'auto-completion' on some property (thus return some
part of the TermEnum for example)

Regards Ard

>
> BR,
>
> Jukka Zitting



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Mime
View raw message