jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: full text search improvements
Date Mon, 26 Mar 2012 14:10:58 GMT
On Mon, Mar 26, 2012 at 3:54 PM, Thomas Mueller <mueller@adobe.com> wrote:
> Hi,
>
>>What our customers also want, is to be able to query on what a
>>document for the end-user (customer) is : Some customers have the
>>author of a document being some 'author node' referenced by the
>>'document node' : Now, by the author's name, you do not find the
>>document, because the authors name is stored somewhere else.
>
> This sounds like a join to me, like:
>
>    select * from document d inner join author a on a.id = d.authorId
>
> I would expect the JCR SQL-2 query to look similar.

I haven't looked at / tested JCR joins : I just can't imagine that is
scales enough, but perhaps this is more related to my 'Lucene 1.4
experience'  :)

>
>>Are there plans to also have some ocm mapping for jr3?
>
> Not directly, that is, not within oak-jcr, oak-core, and oak-mk.
>
>> It might make
>>sense, to be able to create external indexes by annotating ocm beans
>
> I don't think oak-core should depend on OCM.

No, agreed

> But your index implementation
> (should we call it "query index"?) could use OCM, and the query engine
> could be configured to use your index implementation.

Yes, that's pretty much how I'd hope it could work

>
>>Indexes can be a bit out of sync, when some reference node changes
>>(think about a changing author name), but imo acceptable for full text
>>indexes
>
> Yes, I think fulltext search doesn't need to be real-time.
>
>>We exposed it over virtual layers, but, during
>>the past years, performance and memory wise, I've switched my opinion
>>that I'd rather opt for not having faceted navigation exposed as
>>virtual nodes.
>
> Are virtual nodes a performance / memory problem? I don't see why this
> should be the case for Oak. But if it turns out that regular nodes are
> simpler, then maybe you should create regular nodes... Those could be
> maintained by your index implementation. For example, one node for each
> "fulltext search term".

I am not sure if it would be an issue for oak, but for jr 1 and 2, we
build up jcr session keeping virtual node states in memory : This can
grow too large, and it not easy to limit. Also, since we have many
millions in jcr nodes while only a couple of hundred of thousands of
documents in general, the build in faceted navigation is too cpu
demanding.

Another disadvantage imo of our current 'seamless' integration of
exposing faceted navigation over virtual layers, is that you cannot
write to these nodes : Some virtual nodes don't even have a canonical
equivalent. This makes the virtual structure also less obvious to use
fro third parties

Either way, it might be more a problem of our current technical
implementation than of oak. but I think it is all much easier if we
expose faceting not over a node structure. Perhaps a row structure,
where some 'row' do not have a backing jcr node?

Regards Ard

>
> Regards,
> Thomas
>



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Mime
View raw message