Return-Path: X-Original-To: apmail-jackrabbit-users-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F1D010F76 for ; Thu, 14 Nov 2013 16:58:36 +0000 (UTC) Received: (qmail 6485 invoked by uid 500); 14 Nov 2013 16:58:34 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 5609 invoked by uid 500); 14 Nov 2013 16:58:31 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 4380 invoked by uid 99); 14 Nov 2013 16:58:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 16:58:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of e.medina.m@gmail.com designates 209.85.212.53 as permitted sender) Received: from [209.85.212.53] (HELO mail-vb0-f53.google.com) (209.85.212.53) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 16:58:20 +0000 Received: by mail-vb0-f53.google.com with SMTP id x17so1894716vbf.40 for ; Thu, 14 Nov 2013 08:57:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=kSTgWzL7HFuVws0YWRS6bnISRMWQFLHpFTCCmnYdeEU=; b=eKzfvwfzhR0xiPSzG53c9zecbbzCG8CV9iIKOeIeRfbBKESsSoyCADayzcF8GzQIhh yppYih71cZeMc3pln86MeuU/3Z9AOGEL/pSmvFdEXy6DAoWIb8/ax2Zlqke8DJhUqNM9 Z8ph/lNyI14XYhN8dmkpCyYAUYTS2oiwBHjyQxNdma0pTKmCLlITGCKwX2DdG2IML3BE 6512J6jKawKoS1HDmc+2BgHZ1lmcZsr57sJIDBjodLnpbVyWkqSUQUGEpql72fCf9sk7 g1QtzdbgTXG5aYkGJkaTOcXD+EOHgzHYG5ywqNLS17WpiHMNmQsuhZUZeU6NURS5aPD/ 6+jA== X-Received: by 10.59.1.41 with SMTP id bd9mr272616ved.63.1384448279373; Thu, 14 Nov 2013 08:57:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.188.233 with HTTP; Thu, 14 Nov 2013 08:57:29 -0800 (PST) In-Reply-To: <7EF8AADC62C93E4A90468DB71A145B780652404A2E@SLHMAIL02.uk.orioncro.com> References: <7EF8AADC62C93E4A90468DB71A145B7806524046AF@SLHMAIL02.uk.orioncro.com> <7EF8AADC62C93E4A90468DB71A145B78065240494F@SLHMAIL02.uk.orioncro.com> <7EF8AADC62C93E4A90468DB71A145B780652404A2E@SLHMAIL02.uk.orioncro.com> From: Enrique Medina Montenegro Date: Thu, 14 Nov 2013 17:57:29 +0100 Message-ID: Subject: Re: FW: Jackrabbits reliability and performance To: users@jackrabbit.apache.org Cc: Mark Essex , Tarun Dogra Content-Type: multipart/alternative; boundary=047d7bdc7a32e6f4b504eb25fb82 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc7a32e6f4b504eb25fb82 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Well, again it depends on how you set up the JCR repository, so all I can give you is a conditional yes to your question... Salu2, Quique. On Thu, Nov 14, 2013 at 5:21 PM, Tarun Dogra wrot= e: > Hi Enrique, > > Thanks for the detailed reply. Unfortunately, I am not familiarised with > the nodes and the BTree side of Jackrabbit framework. So I was expecting = an > answer in terms of the overall picture of how Jackrabbit as a JCR will fi= t > in to our system. > > In brief, we need to integrate Jackrabbit (as advised by our vendor) in t= o > our clinical trial management system. For this, I have already provided y= ou > with the server specification on which the system will be hosted. So just > wanted to know if on such server, Jackrabbit is capable enough to intake > approximately 15GB data per year and be able to manage those many > documents/files (as mentioned before) without being affected in terms of > its performance? We already know it is a much stabilised JCR, but we just > wanted to confirm if such system is able to suffice our organisation=E2= =80=99s > requirements. > > Regards, > Tarun > > > From: Enrique Medina Montenegro [mailto:e.medina.m@gmail.com] > Sent: 14 November 2013 14:29 > To: users@jackrabbit.apache.org > Cc: Mark Essex > Subject: Re: Jackrabbits reliability and performance > > Hi Tarun, > > Let me share my findings with you :-) > > At my work we are evaluating the use of Jackrabbit to build a JCR > repository to store the register of marks (intellectual property) as > documents composed basically of an ID, some metadata (who created it, whe= n, > etc.) and the XML and JSON representation of the mark itself. Currently, = we > have all that information spread in several relational DBs and we would > like to take advantage of the versioning and observation features of the > JCR repository. > > During our initial evaluation, mostly focused on performance, we noticed > serious issues when adding the 1 million marks we have currently in our D= Bs > underneath the same "parent" node, but we found out that this was actuall= y > a known limitation by Jackrabbit, which clearly states that no more than > 10K child nodes should be added to the same "parent "node: > > http://wiki.apache.org/jackrabbit/Performance > > However, we were still sort of forced to follow that path because we were > required to perform an initial dump of all the data in the DBs, and just > adding each mark as a sub-mode proved to be the fastest way to export all > the data in an acceptable window frame. > > Nevertheless, we also tried to shard the nodes as a tree, basically > splitting the 9-digit ID of our marks into 3-digit groups, so each node > could only have as much as 1K sub-nodes within itself. For example, mark > with ID =3D 000342865 would be saved into --> root (node) -> marks (node)= -> > 000 (node) -> 342 (node) --> 000342865 (node). Theoretically, this would > perform much better than our original approach, but as a downside, it wou= ld > dramatically slow down the time it takes to export the 1M marks from the > DBs, going further out of our acceptable window frame (due to the fact > that, for each mark, it had to previously look up the exact node where to > store it, and the bigger the JCR repository was growing, the slower the > node lookup times were, therefore impacting the overall export process). > > We also took a look at the BTreeManager, but we just couldn't make it wor= k > due to the issue I describe here (which BTW has not been answered yet): > > > http://mail-archives.apache.org/mod_mbox/jackrabbit-users/201311.mbox/aja= x/%3CCA%2BdeSP_weUQ0mtSBjoQGy3jq60jZEo7LtmF9kJZkvF1eyNvu-A%40mail.gmail.com= %3E > > So getting back to the original approach of storing everything under the > same node, how did we manage to get acceptable read times? Well, it boils > down to using Lucene's indexation (configured properly to only index the > "id" property, and not all the XML and JSON stuff - using the > IndexingConfiguration in the Search section of the repository config file= ) > to actually perform the search/retrieval of marks. So for instance, inste= ad > of: > > session.getNode("/marks/000342865") --> takes ~2.4segs with 1M marks unde= r > the same node > > we run this query with SQL2: > > SELECT * FROM markType WHERE id =3D '000342865' --> takes tens of ms with= 1M > marks under the same node thanks to Lucene's indexes > > (notice that "markType" is a custom node type that we have created to > model our domain, in this case the marks) > > LESSONS LEARNED: You need to clearly define the scope of your project in > terms of the functionality you're willing to use from Jackrabbit, and the= n > plan for detailed performance workshops to prove your approach. There are > always trade-offs (for instance, in my case, when I want to get the > specific version of a mark, I cannot use the "official" API through > "VersionManager" because it uses direct path to fetch the node prior to > getting the revision --> > session.getWorkspace().getVersionManager().getVersionHistory("/marks/0003= 42865").getVersionByLabel("v.6.0"), > and I have to use the "deprecated" API method from the node itself, once > I've got it using the SQL2 statement mentioned above --> > markNode.getVersionHistory().getVersionByLabel("v.6.0"), with the > uncertainty on when that deprecated API will be removed...). > > Please share your findings in the list as you make progress :-) > > Regards, > Enrique Medina. > > On Thu, Nov 14, 2013 at 10:40 AM, Tarun Dogra > wrote: > Respected Sir/Madam, > > In the next couple of months, we (ORION Clinical Services Ltd., UK) are > about to release a clinical trial management system as a product to be us= ed > in-house by all our employees. We have bought this product off the shelf > from a third party vendor. As suggested by our vendor, we would implement > JackRabbit as the central repository system within this main product. But > we are still not sure whether jackrabbit is an ideal solution to be > integrated with our product and this is where we will need your help and > would appreciate if you could share your expertise. > > Just to give you an overview of our organisation, we will have around 750= 0 > documents (each of size 250K approximately on an average) per "study" > within our clinical trial management framework. We usually take on board > around 7-8 such studies per year. So, on the basis of 8 studies per year= , > the total size of all the documents will grow to 7500 x 250 x 8 =3D 15GB > approximately per year. So just wanted to know a couple of things from yo= u: > > 1. Is Jackrabbit reliable enough as a system to cater to our above > mentioned needs? and > > 2. Will the management of so many documents have any adverse effect= s > on jackrabbit's performance? - considering that Jackrabbit will reside on > one of our own hosted server with the following spec - > > Poweredge R710 > > CPU: 2 x Intel X5550 > > Memory: 16GB > > Operating System: Windows 2008 R2 64bit SP1 > > Disk capacity: C: 142gb and D: 1.22Tb > > > Sorry if you are not the correct department to consult to in regards to > our above mentioned concern and if this is the case, it will be much > appreciated if you could direct us to the right department/person? Many > thanks. > > Look forward to hearing from you. > > Regards, > Tarun > > ________________________________ > **********************************Legal & Confidentiality > Notice************************************** > This email and attachments hereto are strictly private and confidential. > Reading, copying, disclosure or use by anybody else is not authorised. If > you have received this email in error, please delete it and notify us as > soon as possible. > The antivirus software used by ORION is automatically and constantly > updated in an effort to minimise the risk of viruses infecting our system= s, > However, you should be aware that there is no absolute guarantee that any > files attached to this email are virus free. > ORION may monitor email traffic data and also the content of email for th= e > purposes of security and staff training. > ORION Clinical Services Limited is a private limited company registered i= n > England. Company number 3457136. Registered address: 7 Bath Road, Slough, > Berkshire, SL1 3UA. ORION Clinical Services Limited is the parent company > of a number of subsidiary companies. For further details please visit our > website at www.orioncro.com > ________________________________________ > > > ________________________________ > **********************************Legal & Confidentiality > Notice************************************** > This email and attachments hereto are strictly private and confidential. > Reading, copying, disclosure or use by anybody else is not authorised. If > you have received this email in error, please delete it and notify us as > soon as possible. > The antivirus software used by ORION is automatically and constantly > updated in an effort to minimise the risk of viruses infecting our system= s, > However, you should be aware that there is no absolute guarantee that any > files attached to this email are virus free. > ORION may monitor email traffic data and also the content of email for th= e > purposes of security and staff training. > ORION Clinical Services Limited is a private limited company registered i= n > England. Company number 3457136. Registered address: 7 Bath Road, Slough, > Berkshire, SL1 3UA. ORION Clinical Services Limited is the parent company > of a number of subsidiary companies. For further details please visit our > website at www.orioncro.com > ________________________________________ > > --047d7bdc7a32e6f4b504eb25fb82--