jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Marginian <da...@butterdev.com>
Subject Re: Node Retrieval Performance
Date Sat, 14 Nov 2015 00:26:12 GMT
Thanks Dirk, I should have found that page on my own.  I am going to 
look into using the BTreeManager, just curious what are the limitations 
for documents/file counts within a node?  I am planning on storing a lot 
of data in JackRabbit (terabytes).  Also, is the configuration code I 
posted in my previous posts the best way to do things?  Or can I 
simplify it and just do something like this to get a repo:

ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepositoryFactory"));


return JcrUtils.getRepository(jackabbitServerUrl);

On 11/13/2015 03:47 PM, Dirk Rudolph wrote:
> Did I understood you right, you have thousands of child nodes below the
> root node?
>
> You should avoid this because this is considered bad practice in terms of
> write performance and depending on your concurrent access this might also
> block read access.
>
> http://wiki.apache.org/jackrabbit/Performance
>
> Try to introduce a structure to your content using BTreeManger
>
>
> https://jackrabbit.apache.org/api/2.10/org/apache/jackrabbit/commons/flat/BTreeManager.html
>
> Cheers, D
>
>
> On Friday, 13 November 2015, David Marginian <david@butterdev.com> wrote:
>
>> Thanks Clay.  I am not trying to load that many records at once.  The
>> application is crawling a directory.  It places the files from that
>> directory into JackRabbit one at a time, and puts a content id onto a queue
>> which is picked up by consumers on different servers.  Those consumers then
>> use the content id to retrieve the file from JackRabbit. Each piece of
>> content is saved in a node under the root node.  The performance slowdown
>> is coming from calling session.getRootNode(), from what I can gather from
>> the docs I need the root node in order to add a child node.  Note the
>> slowdown is pretty significant and I don't need to have close to 50k to
>> start seeing it (I start seeing it within a few minutes of running my
>> app).  I don't need orderable nodes, how do I disable that?
>>
>>
>> On 11/13/2015 03:10 PM, Clay Ferguson wrote:
>>
>>> ​Please let us know more about your use case. Why are you even "trying" to
>>> load that many records all at once. Or at least scan them one by one, I
>>> mean. In most use cases you wouldn't need to do this kind of thing, unless
>>> it's some kind of backup or replication. I say "most" cases... I'm not
>>>    saying you don't need to just asking for a bit more background. BTW: If
>>> you don't need 'orderable' nodes try to avoid them. That type of node does
>>> not work at 'scale'... and 50K is propably pushing it.​
>>>
>>> Best regards,
>>> Clay Ferguson
>>> wclayf@gmail.com
>>>
>>>
>>> On Fri, Nov 13, 2015 at 3:33 PM, <david@butterdev.com> wrote:
>>>
>>> Hi,
>>>> I am new to JackRabbit and using version 2.11.2.  I am using JackRabbit
>>>> to
>>>> store documents in a multi-threaded environment.  I noticed that the time
>>>> it takes to retrieve the root node is inconsistent and slow (several
>>>> seconds +) and degrades over time (after 50K plus child nodes retrieval
>>>> is
>>>> taking ~15 seconds).
>>>>
>>>> Originally, I was using code as follows to obtain a repository:
>>>>
>>>>    public Repository getRepository() throws ClassNotFoundException,
>>>> RepositoryException {
>>>>
>>>>
>>>> ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepositoryFactory"));
>>>>        return JcrUtils.getRepository(jackabbitServerUrl);
>>>>    }
>>>>
>>>> Then I came across the following thread:
>>>>
>>>>
>>>> http://jackrabbit.510166.n4.nabble.com/getRootNode-takes-27-seconds-td1571027.html#a1571302
>>>>
>>>> This thread had some useful information (BatchReadConfig), but I am not
>>>> certain how to use the API to take advantage of it.  I have changed my
>>>> code
>>>> to the following but it doesn't appear that node retrieval performance
>>>> has
>>>> improved, is there something I am missing/doing wrong?
>>>>
>>>> 1) Repository Factory
>>>> public Repository getRepository(@SuppressWarnings("rawtypes") Map
>>>> parameters) throws RepositoryException {
>>>>           String repositoryFactoryName = parameters != null && (
>>>>
>>>>   parameters.containsKey(PARAM_REPOSITORY_SERVICE_FACTORY) ||
>>>>                           parameters.containsKey(PARAM_REPOSITORY_CONFIG))
>>>>                   ?
>>>> "org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory"
>>>>                   : "org.apache.jackrabbit.core.RepositoryFactoryImpl";
>>>>
>>>>           Object repositoryFactory;
>>>>           try {
>>>>               Class<?> repositoryFactoryClass =
>>>> Class.forName(repositoryFactoryName, true,
>>>>                       Thread.currentThread().getContextClassLoader());
>>>>
>>>>               repositoryFactory = repositoryFactoryClass.newInstance();
>>>>           }
>>>>           catch (Exception e) {
>>>>               throw new RepositoryException(e);
>>>>           }
>>>>
>>>>           if (repositoryFactory instanceof RepositoryFactory) {
>>>>               return ((RepositoryFactory)
>>>> repositoryFactory).getRepository(parameters);
>>>>           }
>>>>           else {
>>>>               throw new RepositoryException(repositoryFactory + " is not
a
>>>> RepositoryFactory");
>>>>           }
>>>>       }
>>>>
>>>> 2) Use the factory to get a repo:
>>>>    public Repository getRepository() throws ClassNotFoundException,
>>>> RepositoryException {
>>>>           Map<String, RepositoryConfig> parameters =
>>>> Collections.singletonMap(
>>>>                   "org.apache.jackrabbit.jcr2spi.RepositoryConfig",
>>>>                   (RepositoryConfig) new
>>>> RepositoryConfigImpl(jackabbitServerUrl));
>>>>
>>>>           return getRepository(parameters);
>>>>       }
>>>>
>>>> 3) Repository Config:
>>>> private static final class RepositoryConfigImpl implements
>>>> RepositoryConfig {
>>>>
>>>>           private String jackabbitServerUrl;
>>>>
>>>>           private RepositoryConfigImpl(String jackabbitServerUrl) {
>>>>               super();
>>>>               this.jackabbitServerUrl = jackabbitServerUrl;
>>>>           }
>>>>
>>>>           public CacheBehaviour getCacheBehaviour() {
>>>>               return CacheBehaviour.INVALIDATE;
>>>>           }
>>>>
>>>>           public int getItemCacheSize() {
>>>>               return 100;
>>>>           }
>>>>
>>>>           public int getPollTimeout() {
>>>>               return 5000;
>>>>           }
>>>>
>>>>           public RepositoryService getRepositoryService() throws
>>>> RepositoryException {
>>>>               BatchReadConfig brc = new BatchReadConfig() {
>>>>                   public int getDepth(Path path, PathResolver resolver)
>>>> throws NamespaceException {
>>>>                       return 1;
>>>>                   }
>>>>               };
>>>>               return new RepositoryServiceImpl(jackabbitServerUrl, brc);
>>>>           }
>>>>
>>>>       }
>>>>
>>>> Thanks for your time.
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>


Mime
View raw message