jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Faceted Search Implementation
Date Wed, 25 Aug 2010 10:50:25 GMT
On Wed, Aug 25, 2010 at 12:23 PM, Ian Boston <ieb@tfd.co.uk> wrote:
> Ard,
> Thank you for the guided tour, most informative.

You're welcome.

> We have complex ACLs based on the standard Jackrabbit 2 ACLs with some additions including
external lookup, these change rapidly so count by iteration looks like the only way at the
moment, although we have found most of the time, where there are > 10 pages of results,
no one pages that far, so social engineering is one solution (eg " > 1000 items") so we
just count upto that number...


Do you also have something like time-based ACLs (like 'now' which
changes every millisec) or do you have 'static' ACLs. If so, you can
follow a quite different approach, which however again depends on the
number of unique ACL rules for jcr sessions and how large you data set
is whether it is possible (and how much time you want to put in to
it), but:

1) If you extends the existing SearchIndex
2) When a search is done, you compute for the jcr session ACL some
kind of 'token' to identify the ACL rule set for that session (users
with similar rule sets get the same token)
3) For all ReadOnlyIndexReader which contain an in memory deleted
bitset, you add a 'authorized bitset', which means that every time a
search comes in with a *new* unique token, you once have to authorize
every Lucene Document to get the auth bitset for that token: This
shouldn't be to hard. After this, you associate a cached auth bitset
with this token. Now every other user having same token also has an in
memory cached bitset.
4) Your searches are done on your 'extended searchindex' which
consists of an set of Lucene ReadOnlyIndexReader's, which in turn have
an extra filter that is for the authorization: Thus, Lucene returns
you authorized hits.
5) Add some api call or something that exposes:
QueryResultImpl#getTotalSize()  : This returns you initially the
lucene hit count, but, as you already made it 'authorized', it returns
you the correct hitcount instantly without having to check access for
every hit. I actually also still have this one open for our Repo [1]

Note, that if new documents are added to the repository, all existing
auth bitsets for all existing ReadOnlyIndexReaders are still valid!
Only, a new index reader is added. For this new one, you'll then need
to still create the auth bitset when a search comes in. But, this is
always a small index containing few nodes.

Regards Ard

ps it won't be simple to implement it all :)

[1] https://issues.onehippo.com/browse/HREPTWO-4430

>
> Ian
> On 25 Aug 2010, at 10:19, Ard Schrijvers wrote:
>
>> Hello Ian et al,

Mime
View raw message