jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <michael.due...@day.com>
Subject Re: jcr2spi NodeIterator.getNode() performances
Date Thu, 04 Mar 2010 16:48:23 GMT


On 3/4/10 4:55 PM, Paco Avila wrote:
> Thanks for the info :)
>
> PD: This info should be included in the Wiki.

Yes, I see what I can do.
Michael

>
>
> On Thu, Mar 4, 2010 at 2:30 PM, Michael Dürig<michael.duerig@day.com>  wrote:
>>> I am interested on these parameters to improve jackrabbit performance. I
>>> have an installation with more than 2 million of documents and performance
>>> is actually poor :(
>>
>> On the current trunk there are 3 parameters which can be used to tweak
>> performance for jcr2spi/spi2davex. These are the size of the item info
>> cache, the size of the item cache and the depth of batch read operations.
>>
>>
>> Some Background:
>> The item cache contains JCR items (i.e. nodes and properties). The item info
>> cache contains item infos. An item info is an entity representing nodes or
>> properties on the SPI layer. The jcr2spi module receives item infos from an
>> SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of
>> JCR items.
>>
>> When an item is requested from the JCR API, jcr2spi first checks whether the
>> item is in the item cache. If so, that item is returned. If not, the request
>> is passed down to the SPI. But before actually calling the SPI the item info
>> cache is check first. If this cache contains the requested item info the
>> relevant part of the JCR hierarchy is build and the corresponding JCR item
>> is placed into the item cache. Only when the item info cache does not
>> contain the requested item info a call will be made to the SPI. Here the
>> batch read depth comes into play. Since calls to the SPI cause some latency
>> (i.e. network round trips), the SPI may - in addition to the actually
>> requested item info - return additional item infos. The batch read depth
>> parameter specifies the depth down to which item infos of the children of
>> the requested item info are returned.
>>
>> Overall the size of the item info cache and the batch read depth should be
>> used to optimize for the requirements of the back-end (i.e. network and
>> server). In general, the item info cache should be large enough to *easily*
>> hold all items from multiple batches. The batch read depth should be a trade
>> off between network latency and item info cache overhead. Finally the item
>> cache should be used to optimize for the requirements of the front-end (i.e.
>> the JCR API client). It should be able to hold the items in the current
>> working set of the API consumer.
>>
>> Some pointers:
>>
>> Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
>> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG
>>
>> Item info cache size:
>> org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE
>>
>> Item cache size:
>> org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE
>>
>> Related JIRA issues:
>> JCR-2497: Improve jcr2spi read performance
>> JCR-2498: Implement caching mechanism for ItemInfo batches
>> JCR-2461: Item retrieval inefficient after refresh
>> JCR-2499: Add simple benchmarking tools for jcr2spi read perform
>>
>> Michael
>>
>> On 2/28/10 9:21 PM, Paco Avila wrote:
>>>
>>> El 28/02/2010 15:50, "Michael Dürig"<michael.duerig@day.com>    escribió:
>>>
>>> François,
>>>
>>> I spent some time on improving performance lately. See
>>> https://issues.apache.org/jira/browse/JCR-2497 and related issues.
>>>
>>> I was able to improve performance for our use case with these fixes.
>>> Getting
>>> the parameters right (i.e. item cache size, item info cache size and batch
>>> read depth) is still quite tricky though and requires careful profiling.
>>>
>>> I can provide more specific information on these parameters if required.
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>>
>>>
>>> François Cassistat wrote:
>>>>
>>>> Ok, I've studied a little what was going on with a packet analyze...
>>>
>>
>
>
>

Mime
View raw message