jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Harrison <peter.harri...@team.orcon.net.nz>
Subject Re: Jackrabbit & Performance
Date Wed, 20 Nov 2013 18:27:33 GMT
I am by no means an expert, but I have been developing for three or four 
months with JackRabbit. The approach I've taken is not to include the 
base records under one node.

For example, you may have classes of patent, such as medical, chemical 
process etc, and so you could break down the mark into subnodes for each 
class of patent. Finding a particular mark by its ID is still quite 
easy, but not as trivial as simply having a path like /mark/<patentid>.

I have put a REST interface in front of JackRabbit that handles simple 
IDs - running the appropriate query, and then returning the object which 
contains the full path.

This idea - that the path itself contains information about a node takes 
a little to get used to, but it allows you to do some very quick 
reporting on specific classes, as searches can be scoped to specific trees.

What I'm learning is that JackRabbit isn't just another kind of DB - so 
you should not treat it as just another kind of flat table. You should 
be creating a deep tree structure rather than a shallow structure. Doing 
this allows you to utilise the path to limit the scope of queries.

PS: I have also modified the Java OCM to allow lists of primitives to be 
stored as properties of a single subnode. I've been making changes to 
OCM on my local system, but am not really sure how to contribute back.

On 20/11/13 23:39, Enrique Medina Montenegro wrote:
> Hi list,
> I’ve been evaluating Jackrabbit for several weeks, performing all sorts of
> performance testing due to the nature of the repository we need to create
> here at OHIM. Not sure if you’re aware of us, but we’re the European Office
> where you have to come to protect the intellectual property of your marks
> and designs in the whole European Community. Currently, we are storing all
> our marks and designs information in a relational DB, and besides serious
> performance issues (it’s an old DB, not Oracle unfortunately) we don’t have
> functionality such as versioning or observation, and the fact that our
> information is perfectly suitable to be modelled into an XML document, led
> us to think about storing it in a JCR repository.
> I went through David’s model and decided to create a single node called
> “marks” and then add one child node for each existing mark in our system
> (~1 million marks where each mark would have ~50 versions/revisions), but
> then I found that adding more than 10K child nodes could lead to potential
> performance issues. However, after some testing, I also found that indexing
> the mark nodes allowed us to query them extremely fast using SQL2, so we
> could overcome the issue with the 10K child nodes.
> For example, instead of doing à session.getNode(“/marks/000345123”) ß we
> could query à SELECT * FROM [iptool:markType] WHERE [iptool:id] =
> ‘000345123’ (notice that we defined our own custom node types and also told
> Lucene just to index the [iptool:id] property through the use of the
> IndexConfiguration configuration).
> Evertyhing was then progressing smoothly, but then we realized that  in
> order to fetch a specific version or even the base version of a particular
> mark, the API recommended using the VersionManager:
> VersionHistory history =
> session.getWorkspace().getVersionManager().getVersionHistory(markNode.getPath());
> Unfortunately, this API makes use of the direct path access to the node
> being versioned, which in our case was killing our performance due to the
> 10K child nodes limitation (sort of). Although there’s the possibility to
> access to the versions directly from the node itself using
> àmarkNode.getBaseVersion() or markNode.getVersionHistory()
> ß these methods are deprecated and we are not quite sure whether they will
> be removed in the short future or left there as an alternative way to
> retrieving the version history from a node.
> Therefore, could I possibly get some answers from you to help us out in
> making our final decision on whether to use Jackrabbit as our official JCR
> repository implementation?
> ´  Is the direct retrieval of the version history through the node itself
> (now deprecated) going to be eventually removed or not? If so, when is it
> planned to be removed? If not, will it be kept as a “valid” alternative to
> the current VersionManager approach?
> ´  Using the Lucene’s indexes is throwing very fast read times (magnitude
> of tens of ms), but do you foresee other hidden issues or side effects to
> maintain ~1M child nodes underneath the same parent “mark” node?
> ´  We also played around the BTreeManager, but we couldn’t make it work
> with custom node types. I even posted this issue in the user mail list, but
> so far I haven’t got any response:
> http://mail-archives.apache.org/mod_mbox/jackrabbit-users/201311.mbox/ajax/%3CCA%2BdeSP_weUQ0mtSBjoQGy3jq60jZEo7LtmF9kJZkvF1eyNvu-A%40mail.gmail.com%3E<https://mailtrack.io/trace/link/d3712d035f427b56d11f00d2265d38a80e23bd13>
> Thanks so much in advance for helping us out to choose Jackrabbit as our
> JCR technology, hopefully!!! J
> Sent with MailTrack<https://mailtrack.io/install?source=signature&referral=e.medina.m@gmail.com>

View raw message