jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lemes, Fernando" <fernando.le...@hp.com>
Subject RE: MySQL without primary keys.
Date Mon, 16 Mar 2015 19:10:43 GMT

   Hello Bart,

   We are expecting to have a Jackrabbit cluster on a really heavy load of both read and write
operations, so we are really concerned about performance. Currently we are in a "pre-production"
stage and we are planning to use Oak soon, but we will probably have to maintain this version
for some time with thousands of users.


===== GLOBAL_REVISION TABLE =====

   Regarding the performance of the query, here are some numbers for 'JOURNAL_GLOBAL_REVISION
set REVISION_ID = REVISION_ID + 1' that I got with JMeter:

   Original: Average time 45 ms with std. dev. of 26 ms
   Modified: Average time 37 ms with std. dev. of 16 ms


   1) Original table

   Create table SQL: 'CREATE TABLE `JOURNAL_GLOBAL_REVISION` (  `REVISION_ID` bigint(20) NOT
NULL,  UNIQUE KEY `JOURNAL_GLOBAL_REVISION_IDX` (`REVISION_ID`)) ENGINE=InnoDB DEFAULT CHARSET=latin1'

   Result of 'EXPLAIN EXTENDED UPDATE JOURNAL_GLOBAL_REVISION SET REVISION_ID = REVISION_ID
+ 1;': 
   # id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
   '1', 'SIMPLE', 'JOURNAL_GLOBAL_REVISION', 'index', NULL, 'JOURNAL_GLOBAL_REVISION_IDX',
'8', NULL, '1', '100.00', 'Using temporary'
                                                                                         
                                                                                         
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^


   2) Modified table (added a PK with auto increment, this makes the query execution plan
changes)

   Create table SQL: ''CREATE TABLE `JOURNAL_GLOBAL_REVISION_MOD` (  `REVISION_ID` bigint(20)
NOT NULL,  `c` int(10) unsigned NOT NULL AUTO_INCREMENT,  PRIMARY KEY (`c`)) ENGINE=InnoDB
AUTO_INCREMENT=1 DEFAULT CHARSET=latin1'

   Result of 'EXPLAIN EXTENDED UPDATE JOURNAL_GLOBAL_REVISION SET REVISION_ID = REVISION_ID
+ 1;' :

   # id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
   '1', 'SIMPLE', 'JOURNAL_GLOBAL_REVISION_MOD', 'index', NULL, 'PRIMARY', '4', NULL, '1',
'100.00', NULL

   It's not a huge difference, but this modification would only require a simple change on
the journal mysql.ddl file and there will be no migration scenarios.



===== LOCAL_REVISION TABLE =====

   On the LOCAL_REVISION table I just added two extra timestamp columns with the creation
and last modified timestamps (MySQL has a feature which auto updates the last modified timestamp,
but for portability a NOW() function could be used or something similar).

   To detect a "dead node" I select the entries that are not updated for a few days (a new
janitor parameter was added for that) and has the revision_id is less than the current global
revision id. These nodes are being removed before the janitor selects until which revision
the journal will be cleaned.

   We are currently using MySQL (Percona), and this is working fine (we destroy and create
new nodes everytime), but we didn't verified the solution for other implementations/DBs.



   Regards,

Fernando Lemes da Silva
Software Designer – Brazil R&D
fernando.lemes@hp.com
+55-11-4197-8124

-----Original Message-----
From: Bart van der Schans [mailto:b.vanderschans@onehippo.com] 
Sent: quinta-feira, 12 de março de 2015 21:36
To: Jackrabbit Dev
Subject: Re: MySQL without primary keys.

Hi Frenando,

On Mon, Mar 2, 2015 at 7:09 PM, Lemes, Fernando <fernando.lemes@hp.com> wrote:
>
>
>    Hello guys,
>
>
>
>    I started to work with Jackrabbit a few months ago and in the 
> project I’m working we are using Jackrabbit 2.8.0 with MySQL as the persistent manager.
> We found some issues regarding the database schema. There are no 
> primary keys at the journal tables.. is there a reason for that? It 
> seems that in a MySQL clustered environment this is important for the nodes synchronization.

I'm not sure if there is a specific reason for that. The tables are indeed used for synchronizing
the cluster. A lock on the table is used to serialize the updates in the cluster.


>    Also we have realized that updating the journal GLOBAL_REVISION 
> table (which is updated all the time) had a performance issue. By 
> examining the query execution plan we found a temporary table being 
> created. As a workaround we just added a dummy column with auto 
> increment value as the primary key, but shouldn’t this table consists only of a column
like that?

I remember I noticed the temporary table creation as well during some investigation. It has
been  a while ago but if I remember correctly for our situation it didn't turn out to be a
real performance issue.
Our situation of course might be wildly different from yours ;-) Do you have any numbers on
the performance gain by just adding the dummy auto increment column? Which MySQL storage engine
are you using?

If the revision id column would be an auto incremented value it would mean the "ownership"
of the revision number (generation) would be transferred from Jackrabbit to the database.
That could impact the portability of the code to other backends or the flexibility in Jackrabbit
for handling the revision ids. I'm not sure if that would be a real issue. However changing
the storage format in MySQL would have an impact for existing installations and require a
migration of the schema when upgrading.

>    We have also implemented a feature to remove Jackrabbit dead nodes 
> from the LOCAL_REVISION table, so the janitor can work properly even 
> in a scenario where cluster nodes may be destroyed anytime. How can I 
> submit this patch?

Could you explain how your solution works? How do you decide if a node is just shutdown for
a while or "dead"?

Regards,
Bart

>
>
>
>
>
>    Regards,
>
>
>
> Fernando Lemes da Silva
>
> Software Designer – Brazil R&D
>
> fernando.lemes@hp.com
>
> +55-11-4197-8124
>
>
Mime
View raw message