jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........
Date Mon, 29 Nov 2010 12:21:39 GMT
On Mon, Nov 29, 2010 at 1:06 PM, Alexander Klimetschek
<aklimets@adobe.com> wrote:

>>The only drawback is that the current jr lucene impl does not fit the
>>InfinispanDirectory (infinispan lucene dir). It is because of the
>>multi-index and never re-open setup in jr: It was state of the art
>>against lucene 1.4, but now mostly redundant.
> Just one node doing the indexing sounds interesting. But I would then
> think we store the index inside the repository (as a randomly-accessible
> binary), so that you can use any persistence manager and the
> implementation is simpler (no need to adapt to the various databases).

You are completely right! Good point.. :-)

> We had some plans to do something like this with additional indexes
> (calling them "collections") that are created by the application side, but
> store inside the repository. And implemented by Lucene (especially for the
> full-text part).

Hmmm... personally, I wouldn't go this route. I think you next line
covers more my thing:

> The idea here is to overcome the problem of the single-big index for the
> entire repository that is mandated by the JCR spec. You often want indexes

And this is a big burden! I think, we could have a single big index
for the JCR spec implementation. But, I wouldn't solve this by having
more small indexes, as collections. I would like to have an option, in
case of XPath, like 'simpleXPath=true' where we limit some of the
options: In other words, not all the jcr spec queries are available,
but it is efficient and fast (we at Hippo limit ourselves to only
efficient xpath queries). If you do not by default store all
properties, and do not have to support complex path constraint (only
simple ones), then, you wouldn't have to bother that much about one
single Lucene index.

Lucene 4.0 will be so blistering fast and efficient...the figures we
need to index with Jackrabbit is peanuts for Lucene. *If* we improve
indexing, a couple of hundreds of millions of nodes is a no-brainer!
We should not be thinking about problems that are a result of the
current implementation and its short comings (they are a result that
it needed to work against Lucene 1.4, this is no critics to be sure!).

> that are only for part of a repository (e.g. /content/siteA) and are
> asynchronous (not blocking other repository writes) and can be more easily
> thrown away, updated etc. without breaking core repository functionality.

asynchronous indexing is already part of the jcr 283 afaik and is
allowed, certainly for binary content

>>Anyway, in due time we need to pick this up at the dev list
> Of course.

To be continued :-)

Regards Ard

> Regards,
> Alex
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel

Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

View raw message