From users-return-12267-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Fri Aug 14 07:12:09 2009 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 16106 invoked from network); 14 Aug 2009 07:12:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Aug 2009 07:12:08 -0000 Received: (qmail 55232 invoked by uid 500); 14 Aug 2009 07:12:15 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 55171 invoked by uid 500); 14 Aug 2009 07:12:14 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 55132 invoked by uid 99); 14 Aug 2009 07:12:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2009 07:12:13 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 14 Aug 2009 07:12:10 +0000 Received: (qmail 16035 invoked by uid 99); 14 Aug 2009 07:11:41 -0000 Received: from localhost.apache.org (HELO fg-out-1718.google.com) (127.0.0.1) (smtp-auth username bdelacretaz, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2009 07:11:41 +0000 Received: by fg-out-1718.google.com with SMTP id 22so288845fge.20 for ; Fri, 14 Aug 2009 00:11:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.86.57.2 with SMTP id f2mr1153013fga.34.1250233906586; Fri, 14 Aug 2009 00:11:46 -0700 (PDT) In-Reply-To: <5bab330d0908132134o68cd9089iacb5cd0a1372d923@mail.gmail.com> References: <5bab330d0908132134o68cd9089iacb5cd0a1372d923@mail.gmail.com> Date: Fri, 14 Aug 2009 09:11:46 +0200 Message-ID: Subject: Re: Performance of a large number of small nodes From: Bertrand Delacretaz To: users@jackrabbit.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, On Fri, Aug 14, 2009 at 6:34 AM, Nigel Sim wrote: > ...I am using Jackrabbit to store a mixture of scientific data, which includes > files and numerical data. The performance of files are fine, but the > numerical data needs to be extracted as datasets based on attributes such as > observation time, and this appears to be quite slow in comparison to a > native DB (obviously). I would really prefer to keep all this related data > in the same management system, so is there a way to improve the ingestion > and retrieval of many small nodes?... Could you take advantage of paths to express the observation time, and use that for "queries"? Storing data under paths like /data/2009/12/24/23/02/58 would allow you to find nodes that belong to a specific day, or hour, by navigating paths, which might be much more efficient than queries. > ...My second question, is there an efficient way to query for the latest > observation? I would assume querying for the node type, sorting, and just > retrieving the first result?... Paths would also help here, and you could use observation to keep track of the path that corresponds to the most recent data, if needed. -Bertrand