jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edgar Poce <edgarp...@gmail.com>
Subject Re: Chained Persistence and Filesystem
Date Sat, 05 Mar 2005 18:45:43 GMT
Hi david

David Nuescheler wrote:
> hi edgar,
> 
> 
>>I'm far (really far) from being a db expert, and I still don't get the
>>big picture of jackrabbit internals, but I think jackrabbit is unable,
>>and probably will allways be, to perform aggregated queries as fast as
>>any of the os or commercial dbs.
> 
> why would that be? the querymanager could be extended
> with an additional query language that does a direct 
> pass-through (i would not necessarily suggest that. in 
> agreement with jukkas statement about trying to avoid 
> relying on the structure of the persistance layer).
> also, i would argue that depending on the nature of the 
> query jackrabbit may have a more adequate index and 
> therefore may easily outperform a classical rdbms.
> 
I didn't see any of these options. It's a little more clear now :). thanks.
I took a look to the jcr searching chapter and I didn't find any 
reference to aggregated functions. A minimal set of functions (avg, 
mean, std, count, min and max among others) would be enough for 
descriptive statistics. And a pluggable mechanism for user defined 
functions would be very useful for more complex calculations. Something 
like udf in mysql. See http://dev.mysql.com/doc/mysql/en/adding-udf.html.
Is this in the scope of jcr?. and, how do I achieve this functionality 
through the jcr api in its current state?.

> you mentioned reporting, in reporting in general 
> performance usually is not so much an issue, since
> in many cases reporting is not done very frequently.
> 
I agree, even in some cases where it's done frecuently. In my project 
I'm using a query handler which is responsible of executing queries. 
Depending on the estimated time it uses synchronous or asynchronous 
strategies, with a single or multiple threads. Query results are 
persisted (here I use jackrabbit too) and will be available for sync 
reporting the next time.

> since i am very interested in making performance
> improvements to jackrabbits query engine where
> necessary, i would really be interested in the nature
> and frequency of your queries, so we have something
> to base optimization on. do you think you can 
> share details?
> 
There are two kind of queries. One for microdata processing (mostly 
async reporting), that's where I need aggregated functions, and another 
for sync reporting. I still don't know the frequency of the queries 
because it's not yet in production and will not be until a few months.
Aggregated queries will include cases ranging from 0 to the entire 
population size by the number of waves. Now I'm handling only one 
household survey with 100k per wave, aproximatly 6 million cases total.
I made a pluggable mechanism for adding source providers and I plan to 
add many more sources, censuses and mainly household surveys.

> jsr-170 specifies persistent queries, which allows
> for substancial optimizations. do you think an 
> extension of the persisent queries could help you
> with respect to performance?
> 
I still don't know, but I will take a look to the jcr doc and the 
jackrabbit implementation and see if it fits my needs. Thanks for the hint.

> regards,
> david
> 

Thank you very much
edgar

Mime
View raw message