jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "Performance" by MichaelDürig
Date Mon, 08 Mar 2010 14:46:31 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Performance" page has been changed by MichaelDürig.
The comment on this change is: Added info on DavEx performance optimization .


  == Q. My XPath query is too slow. ==
- A. Quotes from mailist regarding XPath query performance can be found here: [[http://www.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td15028655.html]]
+ A. Quotes from mailist regarding XPath query performance can be found here: http://www.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td15028655.html
  Performance of XPath queries is much better with 1.5 snapshot.
  == Q. I have too many child nodes and performance goes down. ==
  A. The current internal Jackrabbit design is optimized for small to medium sized child node
sets, i.e. up to ~10k child nodes per node. Really large child node sets negatively affect
write performance.
  Please note that this is not a general issue of JCR but specific to Jackrabbit's current
internal persistence strategy - independent from the fact if you use a normal persistence
manager or a "bundle" persistence manager, albeit the latter one is recommended; see PersistenceManagerFAQ.
Each node contains the references to all its child nodes. This is a design decision inside
Jackrabbit to improve speed when using few child nodes. To improve performance, introduce
some extra-levels to your content model. This also helps humans to explore the repository
when using a browser tool. Typical solutions are to use some categories of the context of
your data or date folders, such as "2009/01/09".
  == Q. I have many references to a single node and performance goes down. ==
  A. The current Jackrabbit design is not optimized for many nodes referencing a single node,
because for easy back-referencing in the JCR API all those references are stored in the target
node. Please note that many people don't recommend references in a content model anyway -
see for example DavidsModel, [[http://wiki.apache.org/jackrabbit/DavidsModel#head-ed794ec9f4f716b3e53548be6dd91b23e5dd3f3a|rule
+ == Q. How can I improve performance with DavEx remoting (jcr2spi / spi2davex) ==
+ On the current trunk there are 3 parameters which can be used to tweak performance for jcr2spi/spi2davex.
These are the size of the item info cache, the size of the item cache and the depth of batch
read operations.
+ Some Background:
+ The item cache contains JCR items (i.e. nodes and properties). The item info cache contains
item infos. An item info is an entity representing nodes or properties on the SPI layer. The
jcr2spi module receives item infos from an SPI implementation (i.e. spi2davex) and uses them
to build up a hierarchy of JCR items.
+ When an item is requested from the JCR API, jcr2spi first checks whether the item is in
the item cache. If so, that item is returned. If not, the request is passed down to the SPI.
But before actually calling the SPI the item info cache is check first. If this cache contains
the requested item info the relevant part of the JCR hierarchy is build and the corresponding
JCR item is placed into the item cache. Only when the item info cache does not contain the
requested item info a call will be made to the SPI. Here the batch read depth comes into play.
Since calls to the SPI cause some latency (i.e. network round trips), the SPI may - in addition
to the actually requested item info - return additional item infos. The batch read depth parameter
specifies the depth down to which item infos of the children of the requested item info are
+ Overall the size of the item info cache and the batch read depth should be used to optimize
for the requirements of the back-end (i.e. network and
+ server). In general, the item info cache should be large enough to '''easily''' hold all
items from multiple batches. The batch read depth should be a trade
+ off between network latency and item info cache overhead. Finally the item cache should
be used to optimize for the requirements of the front-end (i.e.
+ the JCR API client). It should be able to hold the items in the current working set of the
API consumer.
+ Some pointers:
+ Batch reading: org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
+ org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG
+ Item info cache size:
+ org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE
+ Item cache size:
+ org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE
+ Related JIRA issues:
+ JCR-2497: Improve jcr2spi read performance
+ JCR-2498: Implement caching mechanism for ItemInfo batches
+ JCR-2461: Item retrieval inefficient after refresh
+ JCR-2499: Add simple benchmarking tools for jcr2spi read perform

View raw message