jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Nowak <marcin.j.no...@comarch.com>
Subject Re: eXist
Date Mon, 23 Apr 2007 08:39:08 GMT
Hi,

First of all, my intention was definitely not to troll - I am looking 
for the best solution for an XML storage, my favourite is Jackrabbit but 
I've found something what in my opinion performs better - I am only 
asking why? I really want to use Jackrabbit, I like it versioning and 
referencing features but I need it to be a high performance XML storage.

In fact my question was based on short testing, but not just 5 minutes 
:) I have created a repository containing a collections nested in each 
other(three of them) each with three 4,5 MB XML files. Then I've 
launched a query (btw - import times are impressive (4,5MB XML in ca. 10 
seconds)- will you agree? If not - show me how to configure Jackrabbit 
to preform that good(same import in Jackrabbit took ca. 16 minutes on 
same machine) - again please don't take it as trolling - **I really want 
to know how to configure Jackrabbit to be high-performance**). Query was 
really simple

for $x in //type where $x='STRING_SINGLE'
return $x

and was performed on the whole DB - correct me if I am wrong. Results of 
querying I have received after less than 4 seconds.

I know how Jackrabbit performs in default configuration, on derby, 
mysql, and oracle DB very well, you can see results of my tests 
somewhere here in mailing archives, I've published complex report some 
time ago, after that report I have made those tests again - because of 
changes made in Jackrabbit source code, results were better but in 
comparison to eXist, again, not to optimistic.

My main question is that is there anything that can speed up Jackrabbit 
to get close to performance results achieved in eXist? Take this 
question seriously - performance is one of the main requirements to XML 
storage which I need.

BR,
Marcin Nowak

Jean-Baptiste Quenot wrote:
> * Marcin Nowak:
>
>   
>> Recently I've  discovered XML database quite  similar in general
>> concepts to Jackrabbit,  in fact it does  not provide versioning
>> and  referencing  between  nodes  but   it  is  really  fast  as
>> I  compared  it  with  Jackrabbit, especially  in  querying  and
>> importing nodes, question is why Jackrabbit performs so badly in
>> comparison to eXist?
>>     
>
> You're asking  for a troll very  obviously, so I won't  comment on
> it, but there are a few things that are worth to mention:
>
> 1. eXist  is  an XML  database,  Jackrabbit  is  not, so  you  are
>    comparing two  unrelated things.   Moreover, even if  the query
>    syntax can look similar, eXist returns XML, whereas JCR returns
>    Java objects.  You need to understand the implications of this,
>    namely parsing the  resulting XML and work with  it can quickly
>    lead to  memory and CPU  starvation, especially when  the query
>    returns a lot of documents.  JCR  plays nicely with this, as it
>    returns an iterator on the data set.
>
> 2. Jackrabbit is  mostly seen  as a Java-API,  whereas eXist  is a
>    standalone beast with specific servlets that talk xmlrpc, REST,
>    and  so  on mostly  accessed  using  HTTP requests  causing  an
>    additional  overhead.  eXist  even  has a  front-end  based  on
>    Cocoon.  A  *lot* of caching is  done on the eXist  side, while
>    with Jackrabbit you will need  a second-level cache in your own
>    code to address that.
>
> 3. In my  book, eXist is not  designed to let you  query the whole
>    database at  once, whereas  Jackrabbit allows  you to  return a
>    sorted  subset  of documents  from  the  whole repository  very
>    efficiently,  by design.   Accessing one  XML document  is very
>    different from querying the whole database with 10k+ documents.
>    Play with eXist more than 5 minutes with a serious data set and
>    you will notice by yourself.
>   
> 4. Jackrabbit's efficiency  at importing nodes depends  largely on
>    the persistence  and filesystem  implementation you  are using.
>    For example I've seen the  BDB storage backend perform 10 times
>    faster than the XML-file-based one.
>
> 5. When  you compare  two approaches  (one XML  database, one  JCR
>    repository) for your own usecase, and moreover when you ask for
>    feedback about  your experiments,  publish the results  of your
>    benchmarks, be very  careful to mention *what*  you tested, and
>    *how*.  You also need to mention of course the numeric figures.
>    Otherwise you're just spreading FUD.
>
> Cheers,
>   

Mime
View raw message