jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@adobe.com>
Subject Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........
Date Mon, 29 Nov 2010 13:19:30 GMT
On 29.11.10 13:21, "Ard Schrijvers" <a.schrijvers@onehippo.com> wrote:
>And this is a big burden! I think, we could have a single big index
>for the JCR spec implementation. But, I wouldn't solve this by having
>more small indexes, as collections. I would like to have an option, in
>case of XPath, like 'simpleXPath=true' where we limit some of the
>options: In other words, not all the jcr spec queries are available,
>but it is efficient and fast (we at Hippo limit ourselves to only
>efficient xpath queries). If you do not by default store all
>properties, and do not have to support complex path constraint (only
>simple ones), then, you wouldn't have to bother that much about one
>single Lucene index.

As written in my other mail, there are good reasons to allow for separate
indexes, to resolve conflicts of different indexing needs for different
applications. Maybe this is only true for the (node-scoped) full text
index, where you can't exclude certain properties at query time.

And the big advantage of those collections is that you solve the path
constraint issue, at least for those queries like:

/content/siteA//*[jcr:contains(., 'term') and @myProp='foo']

because you would have a collection for /content/siteA, /content/siteB,
etc. with just the right full text / property index.

>Lucene 4.0 will be so blistering fast and efficient...


>the figures we
>need to index with Jackrabbit is peanuts for Lucene. *If* we improve
>indexing, a couple of hundreds of millions of nodes is a no-brainer!

With the exception of the path constrained, as this is not indexed. Maybe
it will be easier with Lucene 4.0 to index the path, especially allow for
fast updates of the path property when something is moved?

>We should not be thinking about problems that are a result of the
>current implementation and its short comings (they are a result that
>it needed to work against Lucene 1.4, this is no critics to be sure!).


>asynchronous indexing is already part of the jcr 283 afaik and is
>allowed, certainly for binary content

Sure, but still indexing takes a major part of a save() call, AFAIK.


Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel

View raw message