jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: full text search improvements
Date Mon, 26 Mar 2012 13:40:45 GMT
On Sat, Mar 24, 2012 at 3:12 PM, Lukas Kahwe Smith <mls@pooteeweet.org> wrote:
> Hi,
>
> I am not a Jackrabbit developer but a very interested user and co-lead of the PHPCR [1]
initiative.
> I wanted to expand partially on what Ard said about potentially looking into hooking
in Solr/ElasticSearch [2] but some other issues I see with full text search in Jackrabbit
2.x
>
> 1) scaling
>
> Now first up I am overall quite happy with the scalability of Jackrabbit 2.x.
> Obviously there are two places though where at some point we need to support sharding
and that is the persistence manager (which seems to be covered in the current Oak plans) and
the lucene index (which doesnt seem to covered). Now imho there are already two perfectly
fine projects working on this with Solr (the more natural choice since its also an Apache
project) and ElasticSearch (imho it provides a much better API).
>
> More over (optionally) leveraging these has several other advantages:
> - mature products (especially ElasticSearch is very mature when it comes to sharding),
supporting them might also attract new users to Jackrabbit
> - handle much larger data sets via sharding
> - provide many more full text search specific features

What our customers also want, is to be able to query on what a
document for the end-user (customer) is : Some customers have the
author of a document being some 'author node' referenced by the
'document node' : Now, by the author's name, you do not find the
document, because the authors name is stored somewhere else.

Are there plans to also have some ocm mapping for jr3? It might make
sense, to be able to create external indexes by annotating ocm beans :
This way, you also have the api for the search result, as it will just
return the ocm pojo's : This is actually the approach I want to take
for the content beans we have (where a developer can through
annotation specify how to index).

Indexes can be a bit out of sync, when some reference node changes
(think about a changing author name), but imo acceptable for full text
indexes

> - less pressure on Jackrabbit to support these features [3] [4]
> - as these are both Lucene based the amount of code needed (for example to convert QOM
to Solr/ElasticSearch) will be minimal
>
> ---
>
> 2) facetting
>
> Now I mentioned facetting [4] above. Right now Jackrabbit does not even support COUNT()
[5], which I find very painful and a major oversight. But really what people have come to
expect from full text search is facetting. Imho its so important that it should even be part
of JCR 2.1 [6] and as you can see in this link it seems like HippoCMS developers agree that
its a very useful feature to have inside Jackrabbit.

Yes, useful, but with hindsight, I wouldn't go for a seamless
integration any more : We exposed it over virtual layers, but, during
the past years, performance and memory wise, I've switched my opinion
that I'd rather opt for not having faceted navigation exposed as
virtual nodes. Still, being able to query the content over faceted
navigation is desired by almost all customers

>

Regards Ard

Mime
View raw message