jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liang cheng <lcheng...@gmail.com>
Subject about removing Old Revisions from journal table.
Date Wed, 29 May 2013 07:26:58 GMT
 Hi, all
   In our production environment, the Jackrabbit Journal table would become
large (more than 100, 000 records) after running 2 weeks. As a result, we
plan to utilize the janitor thread to remove old revisions mentioned in
http://wiki.apache.org/jackrabbit/Clustering#Removing Old Revisions.
  After enabling it, there would be several caveats as mentioned in the
wiki page too.
       1. If the janitor is enabled then you loose the possibility to
easily add cluster nodes. (It is still possible but takes detailed
knowledge of Jackrabbit.)
       2. You must make sure that all cluster nodes have written their
local revision to the database before the clean-up task runs for the first
time because otherwise    cluster nodes might miss updates (because they
have been purged) and their local caches and search-indexes get out of
      3. If a cluster node is removed permanently from the cluster, then
its entry in the LOCAL_REVISIONS table should be removed manually.
Otherwise, the clean-up thread will not be effective.

  I can understand point #3.But not quite sure about #1 and #2.

  #1 is our biggest concern. In our production environment,  we have cases
to need add new cluster node(s), e.g. If system capacity could not handle
current workload, or if some running node needs to be stopped for some
while for maintenance and then new node needs to be added. In #1, you only
say that "you loose the possibility to easily add cluster nodes", but
doesn't give more explaination about the reason.  As I know, when new node
is added into the JR cluster, there is no lucene index, then Jackrabbit
would build the index for the whole current repository nodes (build from
root node). After this step, Jackrabbit then process the revisions
generated by other nodes. *I wonder what's the possible issue when
processing old revisions with latest repository content in cache and

  For #2, *does it mean any manual work needed to keep the consistency?*

  Although the wiki page give one approch to add new cluster node manually
(i.e. clone indexes and local revision number from existing node), we still
hope there is some safe  programming way to avoid the manual work, because
our production is deployed in Amazon EC2 environment and adding new node
needs easily as much as possible.

  Could you please give some comments to my concerns? Thanks.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message