jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Dirk.Rudo...@t-systems.com>
Subject Best practice: Working with relational references in Jackrabbit with suitable performance
Date Sat, 09 Nov 2013 17:12:22 GMT
Hi all,

 

we currently work on an application using jackrabbit (CRX) with a lot of content (more than
15000 documents). 

 

To fit the requirements we had to create some relations between our documents (like symlinks
in unix systems). For example we have a document

 

/docs/generated/document1

 

that should be referenced from several other locations, say:

 

/some/path/refToDocument1

/some/other/path/refToDocument1

/and/another/path/refToDocument1

 

We implemented this using a property, where we store the path to the original document. These
reference is resolved in a higher layer of our application. 

 

Now we have obtain a list of all referenced documents below a given path, filtered by  properties
currently stored at the document itself and conditionally sorted by a set of properties. 
Therefor we tried two approaches yet:

 

1.       Setting a mixin type to the references, we can query for and get an unsorted, unfiltered
(very huge) result set, we afterwards filter and sort

2.       Iterating (using multiple threads) over the hole tree below the given path, only
collecting nodes matching the given filter. Sorting is done afterwards

 

Both of the solutions didn't perform very well. In (1) the search took about 900 ms (this
is ok for about 10000 entries in the result set, I think) and the filtering took about 3000ms.
In (2) the traversing took 4500ms including filtering only. So both solutions are not suitable
for our project and we are looking for a better way to model the given requirement. So what
is the best way to work with relational content in jackrabbit?

 

The last idea I had to solve the performance issue is to reduce the size of the result set
by querying the documents directly, applying filtering and sorting using Lucene but this failed
due to the complex sorting we have to implement. For example: order by property a when b doesn't
exists otherwise use b. So is it possible to implement conditional sorting using the properties
available in the index?

 

Any other hints according to performance improvements are very welcome. (bundleCacheSize is
already increased to about 10% of available heap size ;-)).



Thanks so far,

Dirk Rudolph  




T-Systems Multimedia Solutions GmbH 
Organisationseinheit CCS
Dirk Rudolph
Software-Entwicklung, OCJP

Hausanschrift: Riesaer Straße 5, 01129 Dresden 
Postanschrift: Postfach 10 02 24, 01072 Dresden 
+49 351 2820-5363       (Tel)  
E-Mail: Dirk.Rudolph@t-systems.com <mailto:mDirk.Rudolph@t-systems-mms.com> 
Internet: http://www.t-systems-mms.com <http://www.t-systems-mms.de/> 

T-Systems Multimedia Solutions GmbH 

Aufsichtsrat: Klaus Werner (Vorsitzender)
Geschäftsführung: Peter Klingenburg, Susanne Heger, Dr. Rolf Werner
Handelsregister: Amtsgericht Dresden HRB 11433 
Sitz der Gesellschaft: Dresden 
Ust-IdNr.: DE 811 807 949 

 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message