From users-return-12283-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Sat Aug 15 04:33:12 2009 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 93531 invoked from network); 15 Aug 2009 04:33:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Aug 2009 04:33:12 -0000 Received: (qmail 1458 invoked by uid 500); 15 Aug 2009 04:33:19 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 1394 invoked by uid 500); 15 Aug 2009 04:33:18 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 1383 invoked by uid 99); 15 Aug 2009 04:33:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Aug 2009 04:33:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nigel.sim@gmail.com designates 209.85.219.206 as permitted sender) Received: from [209.85.219.206] (HELO mail-ew0-f206.google.com) (209.85.219.206) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Aug 2009 04:33:09 +0000 Received: by ewy2 with SMTP id 2so1184598ewy.43 for ; Fri, 14 Aug 2009 21:32:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=UPcAJgGb5ZLCn9m25YzS5tmLh4tekqm+okEMxwPXxzg=; b=kKleDoqkAq/oLZNFPpS9UVWPYNkSJHnum/k/ppbNcln/YdnvEbyzHN0cl3kv72Agwz 2IFjASGnW2i0sKYzYYs1dkwJ+T5FDuJLmX7bAID3NKDj3mIWbKbIGqMfSmf4oVRy1qgY ILXp9ab1zo7+/adbl6NlCqxK5YxmXbvPJWRZg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=RiZdTAOoppk26o+c5A1lIMIxtA8CJz021B42iGh+d4sQ1CQv58wfebI0E50oadrrVA 3yiBoLe1dew8LKXqyABkonL41gYTdMdBHQNGgInpQeztD8vMRn9Nrd54AGwr/So8Kp2W nEJbbAmFSo/2R6ptR9jCtuhueoCLLt9HUN5FU= MIME-Version: 1.0 Received: by 10.210.70.8 with SMTP id s8mr358014eba.69.1250310769092; Fri, 14 Aug 2009 21:32:49 -0700 (PDT) In-Reply-To: References: <5bab330d0908132134o68cd9089iacb5cd0a1372d923@mail.gmail.com> From: Nigel Sim Date: Sat, 15 Aug 2009 14:32:29 +1000 Message-ID: <5bab330d0908142132l73308281te660ca50db630418@mail.gmail.com> Subject: Re: Performance of a large number of small nodes To: users@jackrabbit.apache.org Content-Type: multipart/alternative; boundary=0015174c3c6a4028cd047126a9a0 X-Virus-Checked: Checked by ClamAV on apache.org --0015174c3c6a4028cd047126a9a0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Bertrand, Thanks for your suggestion. Unfortunately, even in the simplest case of 100 nodes in the root node, the time taken to retrieve is too long. If I could resolve this fundamental speed issue then I could apply your solution to help me scale my system. I think I just need to bite the bullet and admit my use case doesn't really map on Jackrabbit :) Thanks Nigel 2009/8/14 Bertrand Delacretaz > Hi, > > On Fri, Aug 14, 2009 at 6:34 AM, Nigel Sim wrote: > > ...I am using Jackrabbit to store a mixture of scientific data, which > includes > > files and numerical data. The performance of files are fine, but the > > numerical data needs to be extracted as datasets based on attributes such > as > > observation time, and this appears to be quite slow in comparison to a > > native DB (obviously). I would really prefer to keep all this related > data > > in the same management system, so is there a way to improve the ingestion > > and retrieval of many small nodes?... > > Could you take advantage of paths to express the observation time, and > use that for "queries"? > > Storing data under paths like /data/2009/12/24/23/02/58 would allow > you to find nodes that belong to a specific day, or hour, by > navigating paths, which might be much more efficient than queries. > > > ...My second question, is there an efficient way to query for the latest > > observation? I would assume querying for the node type, sorting, and just > > retrieving the first result?... > > Paths would also help here, and you could use observation to keep > track of the path that corresponds to the most recent data, if needed. > > -Bertrand > -- JCU eResearch Centre School Of Business (IT) James Cook University --0015174c3c6a4028cd047126a9a0--