jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seidel. Robert" <Robert.Sei...@aeb.de>
Subject AW: about removing Old Revisions from journal table.
Date Wed, 29 May 2013 07:59:34 GMT

#2 - you can ignore this one. The janitor deletes only entries which are older than the lowest
entry in local revisions. So only if you just setup a new cluster node, which has never written
its local revision entry at all would be affected.

Regards, Robert

-----Urspr√ľngliche Nachricht-----
Von: liang cheng [mailto:lcheng.nj@gmail.com]
Gesendet: Mittwoch, 29. Mai 2013 09:27
An: dev@jackrabbit.apache.org
Cc: users@jackrabbit.apache.org
Betreff: about removing Old Revisions from journal table.

 Hi, all
   In our production environment, the Jackrabbit Journal table would become large (more than
100, 000 records) after running 2 weeks. As a result, we plan to utilize the janitor thread
to remove old revisions mentioned in http://wiki.apache.org/jackrabbit/Clustering#Removing
Old Revisions.
  After enabling it, there would be several caveats as mentioned in the wiki page too.
       1. If the janitor is enabled then you loose the possibility to easily add cluster nodes.
(It is still possible but takes detailed knowledge of Jackrabbit.)
       2. You must make sure that all cluster nodes have written their local revision to the
database before the clean-up task runs for the first
time because otherwise    cluster nodes might miss updates (because they
have been purged) and their local caches and search-indexes get out of sync.
      3. If a cluster node is removed permanently from the cluster, then its entry in the
LOCAL_REVISIONS table should be removed manually.
Otherwise, the clean-up thread will not be effective.

  I can understand point #3.But not quite sure about #1 and #2.

  #1 is our biggest concern. In our production environment,  we have cases to need add new
cluster node(s), e.g. If system capacity could not handle current workload, or if some running
node needs to be stopped for some while for maintenance and then new node needs to be added.
In #1, you only say that "you loose the possibility to easily add cluster nodes", but doesn't
give more explaination about the reason.  As I know, when new node is added into the JR cluster,
there is no lucene index, then Jackrabbit would build the index for the whole current repository
nodes (build from root node). After this step, Jackrabbit then process the revisions generated
by other nodes. *I wonder what's the possible issue when processing old revisions with latest
repository content in cache and indexes?

  For #2, *does it mean any manual work needed to keep the consistency?*

  Although the wiki page give one approch to add new cluster node manually (i.e. clone indexes
and local revision number from existing node), we still hope there is some safe  programming
way to avoid the manual work, because our production is deployed in Amazon EC2 environment
and adding new node needs easily as much as possible.

  Could you please give some comments to my concerns? Thanks.



AEB treffen Sie im Juni auf diesen Veranstaltungen:
transport logistic | 4.-7. Juni 2013 | M√ľnchen
EXCHAiNGE | 18.-19. Juni 2013 | Frankfurt am Main
Weitere Informationen und Terminreservierung unter: www.aeb.de/events<http://logi4.xiti.com/gopc.url?xts=487638&xtor=AD-5-[aeb%20mails]-[link%20in%20mailsignatur]-[intext]-[e-mail-signatur]-[0]-[]&url=http://www.aeb.de/de/events/index.php>

View raw message