jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seidel. Robert" <Robert.Sei...@aeb.de>
Subject AW: my managers are against jackrabbit
Date Thu, 16 Sep 2010 11:12:41 GMT
Hi,

>You're probably doing something wrong. A basic performance test in the 
>Jackrabbit benchmark suite (see test/performance under Jackrabbit
>trunk) can create 5000 new nodes per second on my mid-level desktop 
>computer.

I've done some testing, but it doesn't help improving performance.

About the performance:

I'm using a clustered datastore (oracle persistence manager).
I'm using versionable nodes.
I save the session after each node is created.

These things are necessary for my use case.

With these: Even if I only set some properties (about 10 or so) and not storing any binary
(no full text index at all), the performance is not getting about 5 nodes per second. 

It starts quite well, but after storing a couple of nodes (10.000 or so), the speed drastically
decreases. It would help to save not so often, but I need to (each node creation is a separate
transaction in my use case). 

>> o    If the database crashes, then everything is lost - you have to recover from
>> database backup and store the work of the day again

> You can use clustering for high availability.

But it doesn't help, if the central database crashes, that all nodes are using (bundled database
persistence manager).

>> Is there a way to improve storage performance/index size?

>Random strings are not best suited for an inverse index like Lucene.
>If you don't need the ability to search your nodes based on these 
>strings, you can disable indexing of those properties with a custom 
>indexing configuration.

I've tested using only one random ASCII String with 5 characters and about 10 other properties
per node with exactly the same values (some small strings) and the index (repository and workspace
together) was about 50 MB for 20000 nodes.

Kindly Regards, 

Robert

-----Ursprüngliche Nachricht-----
Von: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Gesendet: Montag, 13. September 2010 19:10
An: users@jackrabbit.apache.org
Betreff: Re: my managers are against jackrabbit

Hi,

On Mon, Sep 13, 2010 at 6:45 PM, Seidel. Robert <Robert.Seidel@aeb.de> wrote:
> o    In our test we could store 3-4 nodes with data a second (with 40 properties
> and a 400 byte clob (with versioning))

You're probably doing something wrong. A basic performance test in the
Jackrabbit benchmark suite (see test/performance under Jackrabbit
trunk) can create 5000 new nodes per second on my mid-level desktop
computer.

What's your persistence configuration? Another possible cause is the
high amount of time spent indexing your content, see more below.

> o    Clustering doesn't help because it doesn't scale storage performance

Clustering helps notably for read workloads, but won't help with write access.

> o    If the database crashes, then everything is lost - you have to recover from
> database backup and store the work of the day again

You can use clustering for high availability.

> o    The index size for 300000 nodes was really huge, it was about 2 gb (36 of
> the 40 properties are random Unicode strings with a size of 20 characters)
> o    The repository index was about 1,6 gb and the workspace index was
> something like 400 mb (versioning)
> Is there a way to improve storage performance/index size?

Random strings are not best suited for an inverse index like Lucene.
If you don't need the ability to search your nodes based on these
strings, you can disable indexing of those properties with a custom
indexing configuration.

BR,

Jukka Zitting

Mime
View raw message