jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrique Medina Montenegro <e.medin...@gmail.com>
Subject Jackrabbit & Performance
Date Wed, 20 Nov 2013 10:39:55 GMT
Hi list,


I’ve been evaluating Jackrabbit for several weeks, performing all sorts of
performance testing due to the nature of the repository we need to create
here at OHIM. Not sure if you’re aware of us, but we’re the European Office
where you have to come to protect the intellectual property of your marks
and designs in the whole European Community. Currently, we are storing all
our marks and designs information in a relational DB, and besides serious
performance issues (it’s an old DB, not Oracle unfortunately) we don’t have
functionality such as versioning or observation, and the fact that our
information is perfectly suitable to be modelled into an XML document, led
us to think about storing it in a JCR repository.



I went through David’s model and decided to create a single node called
“marks” and then add one child node for each existing mark in our system
(~1 million marks where each mark would have ~50 versions/revisions), but
then I found that adding more than 10K child nodes could lead to potential
performance issues. However, after some testing, I also found that indexing
the mark nodes allowed us to query them extremely fast using SQL2, so we
could overcome the issue with the 10K child nodes.



For example, instead of doing à session.getNode(“/marks/000345123”) ß we
could query à SELECT * FROM [iptool:markType] WHERE [iptool:id] =
‘000345123’ (notice that we defined our own custom node types and also told
Lucene just to index the [iptool:id] property through the use of the
IndexConfiguration configuration).



Evertyhing was then progressing smoothly, but then we realized that  in
order to fetch a specific version or even the base version of a particular
mark, the API recommended using the VersionManager:



VersionHistory history =
session.getWorkspace().getVersionManager().getVersionHistory(markNode.getPath());



Unfortunately, this API makes use of the direct path access to the node
being versioned, which in our case was killing our performance due to the
10K child nodes limitation (sort of). Although there’s the possibility to
access to the versions directly from the node itself using
àmarkNode.getBaseVersion() or markNode.getVersionHistory()
ß these methods are deprecated and we are not quite sure whether they will
be removed in the short future or left there as an alternative way to
retrieving the version history from a node.



Therefore, could I possibly get some answers from you to help us out in
making our final decision on whether to use Jackrabbit as our official JCR
repository implementation?



´  Is the direct retrieval of the version history through the node itself
(now deprecated) going to be eventually removed or not? If so, when is it
planned to be removed? If not, will it be kept as a “valid” alternative to
the current VersionManager approach?

´  Using the Lucene’s indexes is throwing very fast read times (magnitude
of tens of ms), but do you foresee other hidden issues or side effects to
maintain ~1M child nodes underneath the same parent “mark” node?

´  We also played around the BTreeManager, but we couldn’t make it work
with custom node types. I even posted this issue in the user mail list, but
so far I haven’t got any response:

http://mail-archives.apache.org/mod_mbox/jackrabbit-users/201311.mbox/ajax/%3CCA%2BdeSP_weUQ0mtSBjoQGy3jq60jZEo7LtmF9kJZkvF1eyNvu-A%40mail.gmail.com%3E<https://mailtrack.io/trace/link/d3712d035f427b56d11f00d2265d38a80e23bd13>

Thanks so much in advance for helping us out to choose Jackrabbit as our
JCR technology, hopefully!!! J


Sent with MailTrack<https://mailtrack.io/install?source=signature&referral=e.medina.m@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message