jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Question on jcr:deref usage
Date Fri, 24 Nov 2006 16:57:38 GMT
Lei Zhou wrote:
> Thanks Marcel!
> 
> So it seems that due to the limitation of JCR (no aggregation query 
> support), it would be much slower to support this type of application than 
> RDBMS. 
> 
> Is that a correct assessment? 

An RDBMS certainly provides a wider range of operations through SQL than JCR 
with the current set of XPath or SQL syntax. depending on your needs some of the 
queries won't be possible in JCR but others will just be obsolete. E.g. in JCR 
you don't have to execute a query to follow a reference you simply call the 
method Property.getNode().

> Also, to articulate, if I have to present to users with a query result 
> view that is categorized (or grouped) by ProductName, I'd have to do the 
> following: 
> 
> 1. Run query #1
>   //element(*, Document)[@Subject = 'Manual' and 
> jcr:contains(@description, 
>   'maintenance')]
> 
> 2.  iterate through the entire RowIterator (may have thousands of 
> entries),  use Java code
>     to create an aggregated ProductNames/ProductReference pairs collection 
> 
>     (since JCR doesn't have this type of query),
> 
> 3. No "Order By" clause is used because the ProductReferences won't be in 
> same order as
>     the ProductNames, manual sorting is required in Java post-processing

The same can be achieved in one step:

//element(*, Document)[@Subject = 'Manual' and jcr:contains(@description, 
'maintenance')]/jcr:deref(@ProductReference, *) order by @ProductName

this will return an ordered list of product names which contain matches.

> 4. Depending on which category has been selected by user to expand, run 
> query #2, limiting 
>     results to that single product category:
>     (query #2)
>   //element(*, Document)[@Subject = 'Manual' and 
> jcr:contains(@description, 
>   'maintenance') and @ProductReference = '<uuid-of-Product-#1>']

Correct.

> 5. Again, product names has to be de-referenced manually, and ordering has 
> to be moved from
>     the query to the java post-processing

This step I don't understand. What's the purpose of this step and why is it 
needed? Isn't all information already available?

> I'm fairly new to JCR and Jackrabbit. I've found them very helpful in many 
> aspects of managing contents. But I do feel that certains improvements 
> could make Jackrabbit a better choice for enterprise use. 
> 
> #1. In the many years of enterprise application development, I've seen a 
> lot of our content based applications in need of support for complicated 
> search, e.g, search by arbitrary combination of document properties, and 
> grouping of search results (it is not uncommon to see 2, even 3 levels of 
> nested grouping). 
>      -- Aggregations and Joins are definitely a big plus for querying a 
> complicated content model.

Such requirements are also discussed in the expert group of JSR 283. You can 
comment on the current spec and post enhancement wishes to jsr-283-comments@jcp.org.

> I've seen posts mentioning use of Node references to compensate the lack 
> of SQL Join, but what if I need to perform a search like below 
> (ProductNames, Regions and AvailableFors would most likely be categories 
> that are referenced by all documents): 
>     FIND all manuals
>     THAT (ProductName is 'TV' or 'VCR' or 'DVD') 
>          and (Region is 'North America' or 'Europe') 
>          and (AvailableFor is 'distributor' or 'repairHouse')
>      GROUP BY Region, ProductName

such a query is certainly not possible with the current set of XPath or SQL in 
JCR. You would have to break up the query into multiple queries. e.g. retrieve 
uuids for produces with names 'TV', 'VCR' and 'DVD' and use those uuids in a 
query. The same applies to Region and AvailableFor.

IMO XQuery would be a nice fit for those requirements.

> #2. The RDBMS based repository, current DB schema is not very convincing 
> for large enterprise level applications. A more normalized schema might 
> help both performance and #1, but yes, more DB level code may be needed 
> (for performance's sake) and that may limit the portability of the 
> product. 

I'm not sure that's really the case. Usually a normalized schema means less 
performance. There were attempts to create a persistence manager using a 
normalized schema, but in the end the currently used schema turned out to be the 
most practical one.

regards
  marcel

Mime
View raw message