jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Faceted Search Implementation
Date Wed, 25 Aug 2010 11:40:09 GMT

On 25 Aug 2010, at 11:50, Ard Schrijvers wrote:

> On Wed, Aug 25, 2010 at 12:23 PM, Ian Boston <ieb@tfd.co.uk> wrote:
>> Ard,
>> Thank you for the guided tour, most informative.
> 
> You're welcome.
> 
>> We have complex ACLs based on the standard Jackrabbit 2 ACLs with some additions
including external lookup, these change rapidly so count by iteration looks like the only
way at the moment, although we have found most of the time, where there are > 10 pages
of results, no one pages that far, so social engineering is one solution (eg " > 1000 items")
so we just count upto that number...
> 
> 
> Do you also have something like time-based ACLs (like 'now' which
> changes every millisec) or do you have 'static' ACLs. If so, you can
> follow a quite different approach, which however again depends on the
> number of unique ACL rules for jcr sessions and how large you data set
> is whether it is possible (and how much time you want to put in to
> it), but:
> 
> 1) If you extends the existing SearchIndex
> 2) When a search is done, you compute for the jcr session ACL some
> kind of 'token' to identify the ACL rule set for that session (users
> with similar rule sets get the same token)


I can see that the approach will work well where the set of auth tokens is small. In our case,
I think we would need 1 bit per group in the system, although we could compute a hash from
the result to accommodate sparseness. We know from previous production deployments of Sakai
that for 100K users there can be 40K groups, which, IIUC, is going to generate too many authtoken
lucene bitsets to be cached and generated.

The other problem, is although the IndexReader has a static set of documents, the ACLs are
not static and so each ACL modification will cause the bitset derived from that ACL to become
invalid. If the root of a sub tree changes, all bit sets from the subtree become invalid.
Our repositories are write many, most if not all of the 100K users can update content and
if they have any small group management also the ACLs, which means a significant amount of
ACL modification traffic.

Ours is not the typical ECM use case.

I will think about it some more since I don't really know exactly what the real number of
unique authtokens is, or the frequency of acl updates.

Thanks
Ian


> 3) For all ReadOnlyIndexReader which contain an in memory deleted
> bitset, you add a 'authorized bitset', which means that every time a
> search comes in with a *new* unique token, you once have to authorize
> every Lucene Document to get the auth bitset for that token: This
> shouldn't be to hard. After this, you associate a cached auth bitset
> with this token. Now every other user having same token also has an in
> memory cached bitset.
> 4) Your searches are done on your 'extended searchindex' which
> consists of an set of Lucene ReadOnlyIndexReader's, which in turn have
> an extra filter that is for the authorization: Thus, Lucene returns
> you authorized hits.
> 5) Add some api call or something that exposes:
> QueryResultImpl#getTotalSize()  : This returns you initially the
> lucene hit count, but, as you already made it 'authorized', it returns
> you the correct hitcount instantly without having to check access for
> every hit. I actually also still have this one open for our Repo [1]
> 
> Note, that if new documents are added to the repository, all existing
> auth bitsets for all existing ReadOnlyIndexReaders are still valid!
> Only, a new index reader is added. For this new one, you'll then need
> to still create the auth bitset when a search comes in. But, this is
> always a small index containing few nodes.
> 
> Regards Ard
> 
> ps it won't be simple to implement it all :)
> 
> [1] https://issues.onehippo.com/browse/HREPTWO-4430
> 
>> 
>> Ian
>> On 25 Aug 2010, at 10:19, Ard Schrijvers wrote:
>> 
>>> Hello Ian et al,


Mime
View raw message