Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E95009E24 for ; Mon, 26 Mar 2012 11:15:31 +0000 (UTC) Received: (qmail 80974 invoked by uid 500); 26 Mar 2012 11:15:31 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 80943 invoked by uid 500); 26 Mar 2012 11:15:31 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 80929 invoked by uid 99); 26 Mar 2012 11:15:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 11:15:31 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of a.schrijvers@1hippo.com designates 64.18.2.219 as permitted sender) Received: from [64.18.2.219] (HELO exprod7og116.obsmtp.com) (64.18.2.219) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 26 Mar 2012 11:15:19 +0000 Received: from mail-ob0-f174.google.com ([209.85.214.174]) (using TLSv1) by exprod7ob116.postini.com ([64.18.6.12]) with SMTP ID DSNKT3BPsmyTLUa2jrLx4aaznNdhoY+iRFVa@postini.com; Mon, 26 Mar 2012 04:14:58 PDT Received: by mail-ob0-f174.google.com with SMTP id eh20so7149914obb.33 for ; Mon, 26 Mar 2012 04:14:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=X0hT4xlSSBXxX4XkpWCvQ1vf4qU+yFXRXfhSjZ2xClw=; b=abfQ9fHRRl0SrUF1ge8pin5+FhIf2sPusB9S4ApjALOtJkwx0vbURjf/zneUV6Vj5x fjljv4tpdovYMYFurLniIqOZD+XpTPCk/gkaFmXkizZUJrd72mtp0oH73UD7NRpX8mOt v3rpuKVagET33FqaqaEXD3kaIMm1VUqEcV+mN8m3uff5hcXBTtKDeBFSdX+5PL0wz5db +Lk522XVDiWNQHLxGlDt7TjVf9AbD9w0IsHpKPf7KUvX8EBBy5yvFHBimsHFwLGG+/eD yFhY0fDCK/+KbMHNNdxXB9sBSLztIsz2z42+y9l1Ufk7vbMfbJpY+sNg/BcrnxAyvlsj wkxw== MIME-Version: 1.0 Received: by 10.60.29.39 with SMTP id g7mr28035062oeh.6.1332760498243; Mon, 26 Mar 2012 04:14:58 -0700 (PDT) Received: by 10.60.95.168 with HTTP; Mon, 26 Mar 2012 04:14:58 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Mar 2012 13:14:58 +0200 Message-ID: Subject: Re: Re (OAK-36) Implement a query parser - what about indexing? From: Ard Schrijvers To: oak-dev@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQm6Y8qqgNt4tWd/6DrWEM3PSZMvnTo8Ijrj0i5b3gjkzOroFgFQ+bE5cB9pTZCdJJHCWxIf X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Mar 26, 2012 at 12:56 PM, Jukka Zitting wrote: > Hi, > > There's a number of points in this thread that I wanted to address, so > instead of replying to them individually, let me try to summarize my > thinking. > > One of the bigger pain points in the Jackrabbit 2.x architecture has > been the query engine and the workspace-global query index that has > been pretty difficult to customize for special needs and to handle in > terms of backup/recovery and scaling to multiple cluster nodes. My > wish for Oak is that we come up with a much more flexible search and > indexing architecture that solves these issues and is easy to extend > for any future use cases we may encounter. > > I think the biggest issue, as brought up by Alex and then elaborated > by Ard, is the way we handle indexing. Instead of having a single, > more or less fixed index for a repository like in Jackrabbit 2.x, Oak > should provide generic extension points that various different kinds > of indexing components could hook into. We should have at least three > such extension points: pre- and post-commit hooks, and observation > based on the commit journal. > > For example a low-level UUID-to-path index should preferably use the > pre-commit hook for atomic index updates as a part of each commit. A > post-commit hook could be used to trigger full-text extraction of > nt:file binaries, a bit like we currently do in Jackrabbit 2.x. And an > observation client could use the commit journal to feed an external > Solr index for application-level index features. A given deployment > can choose which ones of these and any other indexing components are > needed based on relevant application needs and related > performance/scalability overhead. A single solution does not fit all > needs, so we need to make such customization as easy as possible. > > On the other hand there's a lot of value in having a single, unified > query abstraction instead of having client applications reach out > directly to Solr, Lucene, or custom indexes. Thus, in addition to the > extensions points for indexing, we need a way for the indexing > components to extend the Oak query engine with ways to evaluate given > queries against the various configured indexes. This way all > applications can use the same generic Oak query API (exposed through > QueryManager in JCR, DASL in WebDAV, and/or something else in JSOP) > while leveraging the custom indexes available in each deployment. Thanks for this summary. I now really understand what the goals are and how to achieve it. Especially the unified generic Oak query API is something I really like. Currently, for Hippo, I am doing something similar for the query api, that can seamlessly delegate to Solr or jackrabbit, both returning a jcr node iterator (although the solr index through solrj can also return plain pojo's). I really like the first option (pre-commit example) and third (observation based), and still see many bears on the road for the second (full-text on post-commit) I've one more question regarding the oak search/indexes : Will we be able to query that returns something else than jcr nodes/rows? I frequently want to be able to get a query result from the repository that cannot be returned as node iterators. For example query on stats, or a query for 'auto-completion' on some property (thus return some part of the TermEnum for example) Regards Ard > > BR, > > Jukka Zitting -- Amsterdam - Oosteinde 11, 1017 WT Amsterdam Boston - 1 Broadway, Cambridge, MA 02142 US +1 877 414 4776 (toll free) Europe +31(0)20 522 4466 www.onehippo.com