jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morrell Jacobs <mjac...@maned.com>
Subject Re: Jackrabbit & cluster with Oracle backend
Date Fri, 16 May 2014 20:08:57 GMT
Hi Fabrice,

We are using a similar architecture, but with some different backend components:

JackRabbit: 2.7 (soon to move to 2.8)
Persistence: MySQL
Datastore: S3

JackRabbit JAR is included in the WAR.  There is a cluster of Tomcats hosting the WAR with
a load balancer in front.

For the global / root file system, we DbFileSystem so it is shared by all nodes.  I’m not
sure if this is wrong, but it works for us.  The only anomaly is warnings or errors on start
up after changes to NodeTypes; despite these errors, JackRabbit does startup and function

The workspace files system is in LocalFileSystem and stored in Workspace home (in the local
repository folder).

Currently, tomcat instance will create a new local repository folder on startup and generate
a unique cluster id.  We have plans to changes this to something similar to what you describe:

Have a background process that runs periodically.  After starting and shutting down JackRabbit
to create an update repository folder, the process would remove the cluster_node.id file and
archive the file.  When a new node starts up, it would make a local copy of the archived of
the repository folder before starting JackRabbit.  My tests indicate this should work, but
we have not yet gone ahead with full implementation.

I’m not sure if have a reason for assigning the cluster id yourself; if there is no cluster
id, JackRabbit will generate a unique value (not sure if this is true in 2.2).

We’ve only run into a couple issue running JackRabbit in this configuration:

* we need to call Session.refresh(true) before calling Session.save().  We were not doing
this initially and would get occasional errors.
* every once in a while - we have not been able to determine the conditions, but the frequency
is something like every few months - the search indices will become corrupt on one of the
tomcat instances.  Our current fix is to stop tomcat, discard the local repository folder,
then restart tomcat; JackRabbit will take some time to startup, but it rebuilds all local
data in the repository folder, eliminating any corruption.

I think your approach should work, although I’m not sure you need to manually assign cluster
ids (unless you have some reason for controlling them).

Let me know if have any other questions about our set up.

On May 15, 2014, at 1:09 PM, Fabrice Aupert <fabrice@gmail.com> wrote:

> Hi,
> We're building a 'document manager' for an existing J2EE (java5/Websphere
> 6.1) webapp deployed in a cluster. This manager has to be fully integrated
> into the webap. Due to production constraints, storing data in a shared
> filesystem is not an option. All data/metadata must be stored in an Oracle
> 10g DB.
> We have a working prototype based on Jackrabbit 2.2.13. On each node of the
> cluster, the webapp embeds jackrabbit JAR and owns a dedicated repository
> directory on the local filesystem. This repo contains a repository.xml file
> which is pretty much the same on all nodes except for <Cluster id=""> (see
> attached file). Once the webapp started, the local 'repository' directory
> contains only a few files, index essentially. Example :
> ./repository
> ./repository/repository.xml
> ./repository/workspaces
> ./repository/workspaces/security
> ./repository/workspaces/security/workspace.xml
> ./repository/workspaces/security/index
> ./repository/workspaces/security/index/indexes_2
> ./repository/workspaces/security/index/_0
> ./repository/workspaces/security/index/_0/cache.inSegmentParents
> ./repository/workspaces/security/index/_0/segments_1
> ./repository/workspaces/security/index/_0/segments.gen
> ./repository/workspaces/security/index/_0/segments_2
> ./repository/workspaces/security/index/_0/_0.cfs
> ./repository/workspaces/myrepo
> ./repository/workspaces/myrepo/workspace.xml
> ./repository/workspaces/myrepo/index
> ./repository/workspaces/myrepo/index/indexes_2
> ./repository/workspaces/myrepo/index/_0
> ./repository/workspaces/myrepo/index/_0/_2.cfs
> ./repository/workspaces/myrepo/index/_0/segments_4
> ./repository/workspaces/myrepo/index/_0/cache.inSegmentParents
> ./repository/workspaces/myrepo/index/_0/segments_1
> ./repository/workspaces/myrepo/index/_0/segments.gen
> ./repository/revision.log
> From what we've seen, a thread is started on each node by jackrabbit to
> refresh indexes periodically, allowing synchronization inside the cluster.
> This architectural layout seems to work but, as we lack any real world
> experience with jackrabbit in this context, we would like to check with the
> community that we're not bending jackrabbit capabilities in the wrong
> direction. Could it lead to silent data corruption/inconsistencies ?
> The second point is about giving operations decent tooling to manage the
> jackrabbit repo :
> - Admin console : we were thinking, as our embedded jackrabbit does not
> expose RMI or Webdav interface, relying on jackrabbit-standalone (either
> cli or server mode) : by copying a repository.xml, changing its cluster id,
> and starting a new session with the standalone version from this file, we
> could manage our nodes and to search (using jackrabbitexplorer on top of it
> for example). Could it be a viable solution ?
> - Repo inconsistencies : does the OraclePersistenceManager really support
> the <param name="consistencyFix" value="true" /> ? It does not seem so. Are
> there other tools we could use to investigate and fix problems inside repo
> data ?
> Any input on this matter would be extremely valuable to us.
> Thanks.
> Fabrice Aupert

View raw message