Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 39242 invoked from network); 4 Mar 2010 13:31:36 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Mar 2010 13:31:36 -0000 Received: (qmail 38727 invoked by uid 500); 4 Mar 2010 13:31:25 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 38687 invoked by uid 500); 4 Mar 2010 13:31:25 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 38678 invoked by uid 99); 4 Mar 2010 13:31:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 13:31:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael.duerig@day.com designates 62.192.10.254 as permitted sender) Received: from [62.192.10.254] (HELO mailgw3.day.com) (62.192.10.254) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 13:31:18 +0000 Received: from susi.local (unknown [10.0.0.114]) by mailgw3.day.com (Postfix) with ESMTP id 70F2217028 for ; Thu, 4 Mar 2010 14:30:55 +0100 (CET) Message-ID: <4B8FB613.9040802@day.com> Date: Thu, 04 Mar 2010 14:30:59 +0100 From: =?ISO-8859-1?Q?Michael_D=FCrig?= User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2 MIME-Version: 1.0 To: users@jackrabbit.apache.org Subject: Re: jcr2spi NodeIterator.getNode() performances References: <61037F0E-07E9-490D-8520-91E9540F7908@maya-systems.com> <4B8A825A.8090509@day.com> <8f70391002281221u50052015xf8fc7e3a98d4a5bf@mail.gmail.com> In-Reply-To: <8f70391002281221u50052015xf8fc7e3a98d4a5bf@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit > I am interested on these parameters to improve jackrabbit performance. I > have an installation with more than 2 million of documents and performance > is actually poor :( On the current trunk there are 3 parameters which can be used to tweak performance for jcr2spi/spi2davex. These are the size of the item info cache, the size of the item cache and the depth of batch read operations. Some Background: The item cache contains JCR items (i.e. nodes and properties). The item info cache contains item infos. An item info is an entity representing nodes or properties on the SPI layer. The jcr2spi module receives item infos from an SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of JCR items. When an item is requested from the JCR API, jcr2spi first checks whether the item is in the item cache. If so, that item is returned. If not, the request is passed down to the SPI. But before actually calling the SPI the item info cache is check first. If this cache contains the requested item info the relevant part of the JCR hierarchy is build and the corresponding JCR item is placed into the item cache. Only when the item info cache does not contain the requested item info a call will be made to the SPI. Here the batch read depth comes into play. Since calls to the SPI cause some latency (i.e. network round trips), the SPI may - in addition to the actually requested item info - return additional item infos. The batch read depth parameter specifies the depth down to which item infos of the children of the requested item info are returned. Overall the size of the item info cache and the batch read depth should be used to optimize for the requirements of the back-end (i.e. network and server). In general, the item info cache should be large enough to *easily* hold all items from multiple batches. The batch read depth should be a trade off between network latency and item info cache overhead. Finally the item cache should be used to optimize for the requirements of the front-end (i.e. the JCR API client). It should be able to hold the items in the current working set of the API consumer. Some pointers: Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos() org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG Item info cache size: org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE Item cache size: org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE Related JIRA issues: JCR-2497: Improve jcr2spi read performance JCR-2498: Implement caching mechanism for ItemInfo batches JCR-2461: Item retrieval inefficient after refresh JCR-2499: Add simple benchmarking tools for jcr2spi read perform Michael On 2/28/10 9:21 PM, Paco Avila wrote: > > El 28/02/2010 15:50, "Michael D�rig" escribi�: > > Fran�ois, > > I spent some time on improving performance lately. See > https://issues.apache.org/jira/browse/JCR-2497 and related issues. > > I was able to improve performance for our use case with these fixes. Getting > the parameters right (i.e. item cache size, item info cache size and batch > read depth) is still quite tricky though and requires careful profiling. > > I can provide more specific information on these parameters if required. > > Michael > > > > > > > Fran�ois Cassistat wrote: >> >> Ok, I've studied a little what was going on with a packet analyze... >