jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Kahwe Smith <...@pooteeweet.org>
Subject Re: full text search improvements
Date Mon, 26 Mar 2012 13:51:27 GMT

On Mar 26, 2012, at 09:40 , Ard Schrijvers wrote:

> On Sat, Mar 24, 2012 at 3:12 PM, Lukas Kahwe Smith <mls@pooteeweet.org> wrote:
>> Hi,
>> I am not a Jackrabbit developer but a very interested user and co-lead of the PHPCR
[1] initiative.
>> I wanted to expand partially on what Ard said about potentially looking into hooking
in Solr/ElasticSearch [2] but some other issues I see with full text search in Jackrabbit
>> 1) scaling
>> Now first up I am overall quite happy with the scalability of Jackrabbit 2.x.
>> Obviously there are two places though where at some point we need to support sharding
and that is the persistence manager (which seems to be covered in the current Oak plans) and
the lucene index (which doesnt seem to covered). Now imho there are already two perfectly
fine projects working on this with Solr (the more natural choice since its also an Apache
project) and ElasticSearch (imho it provides a much better API).
>> More over (optionally) leveraging these has several other advantages:
>> - mature products (especially ElasticSearch is very mature when it comes to sharding),
supporting them might also attract new users to Jackrabbit
>> - handle much larger data sets via sharding
>> - provide many more full text search specific features
> What our customers also want, is to be able to query on what a
> document for the end-user (customer) is : Some customers have the
> author of a document being some 'author node' referenced by the
> 'document node' : Now, by the author's name, you do not find the
> document, because the authors name is stored somewhere else.

well you can already do this via a JOIN .. but I guess you are asking to be able to do some
more denormalization during the indexing process for better performance.

(somewhat off topic, but we have this use case in our current application and we are concerned
that some "meta authors" might lead to too many such references .. not sure if addressing
this is part of Oak .. so right now we "partition" the referrers by date, which is ok but
a bit annoying)

>> 2) facetting
>> Now I mentioned facetting [4] above. Right now Jackrabbit does not even support COUNT()
[5], which I find very painful and a major oversight. But really what people have come to
expect from full text search is facetting. Imho its so important that it should even be part
of JCR 2.1 [6] and as you can see in this link it seems like HippoCMS developers agree that
its a very useful feature to have inside Jackrabbit.
> Yes, useful, but with hindsight, I wouldn't go for a seamless
> integration any more : We exposed it over virtual layers, but, during
> the past years, performance and memory wise, I've switched my opinion
> that I'd rather opt for not having faceted navigation exposed as
> virtual nodes. Still, being able to query the content over faceted
> navigation is desired by almost all customers

ok interesting.
does your current solution include support for ACLs?

Lukas Kahwe Smith

View raw message