jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Faceted Search Implementation
Date Wed, 25 Aug 2010 10:23:49 GMT
Thank you for the guided tour, most informative.
We have complex ACLs based on the standard Jackrabbit 2 ACLs with some additions including
external lookup, these change rapidly so count by iteration looks like the only way at the
moment, although we have found most of the time, where there are > 10 pages of results,
no one pages that far, so social engineering is one solution (eg " > 1000 items") so we
just count upto that number...

On 25 Aug 2010, at 10:19, Ard Schrijvers wrote:

> Hello Ian et al,
> On Wed, Aug 25, 2010 at 10:43 AM, Ian Boston <ieb@tfd.co.uk> wrote:
>> On 25 Aug 2010, at 07:55, Ard Schrijvers wrote:
>>> Also note that the faceted navigation is exposed with including an
>>> authorization filter: thus, we expose authorized correct counts
>>> faceted navigation, all blistering fast as it is all in Lucene.
>> Ard,
>> I am interested in the counting.
>> Is this done by counting the number of results from a search or maintaining an aggregate
counter by events, of by adding a low level Lucene class to generate the count ?
> It is the latter: We have chosen to have access rules based on
> properties on nodes. (Through some 'auto-derived' property that sets
> the path on a node as well, we can also create access rules like
> 'nothing below this folder', but the actual access checking is still
> based on a single property on a node). We have been able to translate
> the access rules for this access manager to Lucene Queries (actually
> very simple ones, and thus very fast ones).
> So, what we have in a nutshell is:
> 1) When traversing the virtual tree structure of faceted navigation,
> the 'fac nav query' grows with new key/value pairs: this is being
> translated into a lucene query.
> 2) The Lucene query from (1) is combined with an Authorization Query
> (which could be a cached BitSet as well, but, we do not have
> performance issues: I tested for > 300.000 documents exposed over
> faceted navigation. It is pretty much instant, even for all kind of
> range queries)
> 3) I am just about to check in a demosuite/site that exposes (1) and
> (2) as faceted navigation, with an extra filter, that comes from one
> of the jackrabbit queries, like xpath, sql etc. We can expose any
> jackrabbit search over authorized faceted navigation with correct
> counting. (with (3) however, we suffer from notorious slow range
> queries in jackrabbit, but this is something I can hopefully work on
> the coming year in the core of jackrabbit)
> The online demo here http://www.demo.onehippo.com/  has lots of
> faceted stuff, which is just our jcr exposed faceted navigation. We
> will include (3) shortly, to also show free text search in combination
> with faceted navigation.
> If you'd login to the console at :
> https://cms.demo.onehippo.com/console/ with admin06 admin06 and you
> browse for example to:
> /content/documents/hippogogreen/jobfacets
> you can see the different coloured maps: these are virtual jcr nodes.
> We thus just fetch them over jcr. If you want to see the low-level jcr
> properties, you can also go to
> https://cms.demo.onehippo.com/repository/
> same credentials. It is just another jcr view.
> Obviously, as an admin you can destroy the demo: we flush the content
> every 2 hours, but still appreciated if you do not completely break it
> through the console :-)
>> I have been looking at generating aggregate counts of facets on large datasets, and
have not found a solution other than retrieving all the hits from a search. JR2.1 appears
to be entirely lazy in its retrieval of results and hence there are no totals until the entire
set is retrieved. Thats fine for small result sets, but for large ones its a killer. At the
moment the best we can do is to count upto some number, (eg 500) and beyond that say there
are > 500. Is there a count(*) function in JCR queries?
> There is no count(*). I have stopped testing my faceted navigation
> exposing facets over ranges after 300.000 documents: It kept being
> fast, and did not yet do any caching yet. Will add this when needed.
>> I dont think this is a problem specific to Jackrabbit, rather its a problem for any
search index on a ACL'd data set where the range of ACL combinations is greater than the number
of items in the set (ie cardinality of the inverted index is so great its pointless indexing)
> Yes, this is a general authorized searching issue. Some frameworks,
> like Lucene Connectors Framework index documents along with some
> 'authorisation tokens'. Afaics, when you do it indexing time, this is
> only possible when you have very stable ACLs which hardly ever change,
> and have a couple of 'authorisation groups' where everybody belongs
> to: So, for example, for a shared filesystem in your company, I can
> imagine that there are, say, 3 groups: management, managers and the
> slaves. Now, indexing three tokens extra per document is easy. Before
> querying the index from the LCF, you first ask the connector for a
> token of the current user, et voila, you get authorised searched from
> say, Solr. *But*, for more complex authorisation rules, or complex
> ACLs, I do not see this as an option. However, never asked the LCF
> people how they see this.
> Regards Ard
>> Ian

View raw message