jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] Commented: (JCR-2498) Implement caching mechanism for ItemInfo batches
Date Tue, 16 Feb 2010 15:55:28 GMT

    [ https://issues.apache.org/jira/browse/JCR-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834272#action_12834272
] 

Michael Dürig commented on JCR-2498:
------------------------------------

Some more numbers demonstrating the effect with JCR-2498-poc.patch applied. The 'new/old time'
row gives the quotients of the request times with the patch applied vs. without the patch
applied. The 'new/old rts' row gives the quotients of the network round trips with the patch
applied vs. without the patch applied. 

The first measurement includes all operations (getItem, getNode, getProperty and refresh)
as above. 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1
new/old time: 0.1, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.7, 0.6, 1, 1, 1.1, 0.8
new/old rts: 2.1, 2.8, 1.8, 2.4, 1.8, 1.4, 1.3, 1.2, 1, 1.1, 1, 1, 0.9, 1, 0.9

Most obvious is the vast performance increase (up to factor 10) for reading items. However
this comes along with an increase of the number of network round trips. Three things should
be noted here: 1. For realistic batch sizes the increase of the number of network round trips
is not so significant. 2. The increase of the number of network round trips are caused by
the refresh operations. In the test scenario the number of refresh operations is unrealistically
high (every fourth operation is a refresh). 3. The items in the batches of the test case are
not realistically distributed across the items of the repository. That is, the items are randomly
chosen from the repository. In practice however, the items in a batch would be related to
each other by some locality criteria. I assume that this would further mitigate the observed
effect. 

For completeness sake here the same measurement as above but without refresh operations: 

Batch size: 24340, 12170, 6085, 3043, 1521, 761, 380, 190, 95, 48, 24, 12, 6, 3, 1
new/old time: 0.2, 0, 0, 0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.6, 0.7, 1, 1, 1, 1.1
new/old rts: 1, 1, 0.9, 0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1


> Implement caching mechanism for ItemInfo batches
> ------------------------------------------------
>
>                 Key: JCR-2498
>                 URL: https://issues.apache.org/jira/browse/JCR-2498
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-jcr2spi, jackrabbit-spi
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>         Attachments: JCR-2498-poc.patch
>
>
> Currently all ItemInfos returned by RepositoryService#getItemInfos are placed into the
hierarchy right away. For big batch sizes this is prohibitively expensive. The overhead is
so great (*), that it quickly outweighs the overhead of network round trips. Moreover, SPI
implementations usually choose the batch in a way determined by the backing persistence store
and not by the requirements of the consuming application on the JCR side. That is, many of
the items in the batch might never be actually needed. 
> I suggest to implement a cache for ItemInfo batches. Conceptually such a cache would
live inside jcr2spi right above the SPI API. The actual implementation would be provided by
SPI implementations. This approach allows for fine tuning cache/batch sizes to a given persistence
store and network environment. This would also better separate different concerns: the purpose
of the existing item cache is to optimize for the requirement of the consumer of the JCR API
('the application'). The new ItemInfo cache is to optimize for the specific network environment
and backing persistence store. 
> (*) Numbers follow 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message