Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@jackrabbit.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Message-ID: <4BCC26A9.7030801@rug.nl>
Date: Mon, 19 Apr 2010 11:47:21 +0200
From: Dennis van der Laan <d.g.van.der.laan@rug.nl>
Organization: University of Groningen
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
 rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4
MIME-Version: 1.0
To: users@jackrabbit.apache.org
Subject: Performance question when clustering
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi all,

We are setting up a clustered Jackrabbit environment as the data storage
for our (custom) CMS. We are using Jackrabbit 1.6.0 with an Oracle 10g
database, bundled persistence manager and finegrained ISM locking.
Whenever the repository is accessed through a JcrSession, we first do a
session.refresh().  We were assuming clustering would have some
overhead, but because of the much-talked-about performance bottleneck in
the PersistenceManager and/or SharedItemStateManager, the performance
boost would compensate this overhead.

What we have observed is that having more cluster nodes using the same
Jackrabbit repository has a significant impact on performance, with
almost linear degradation. We started a performance test by uploading
files to just one machine in a two-machine-cluster and the same test on
a six-machine-cluster. The first test was about 3x faster than the
second test. Most of the time, the threads are waiting on a Mutex in
org.apache.jackrabbit.core.cluster.ClusterNode.sync() (called from
SessionImpl.refresh()). Uploading to multiple cluster-machines at the
same time only seemed to increase the performance-impact, probably
because the time a Mutex is being held is longer due to the fact other
nodes in the cluster have updates, too.

So my questions are:
- Is it a good idea to call session.refresh() every time we use the
session?
- Is there a difference between calling session.refresh() and the
automatic sync done by the ClusterNode thread?
- Why is a refresh more expansive when there are more cluster nodes?

Thanks for any information!

Dennis

-- 
Dennis van der Laan, MSc
Centre for Information Technology
University of Groningen