jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Munteanu <romb...@apache.org>
Subject Re: Node Retrieval Performance
Date Sat, 14 Nov 2015 08:02:39 GMT
On Nov 14, 2015 2:21 AM, "Clay Ferguson" <wclayf@gmail.com> wrote:
>
> In my opinion this one issue is the single most crippling achilies heel of
> the entire JCR. Very likely to drive away many potential users of this
API.
> It's touted as an enterprise-scale API, but yet chokes on just a few tens
> of thousands of nodes. This, IMO urgently needs to be addressed. I know
> it's a technical limitation, and not a design decision, but to me that
just
> means it's an 'unsolved' problem. I'm not complaining or criticizing
> developers, i'm just saying that as a community we need to solve this. I
> should be able to have a 50 million nodes, and not be a problem, in an
> ideal situation. RDBMS have solved these issues years ago, by a "never
load
> everything all at once" rule. However somehow the "It's ok to load all
> children in memory" mentality caught on in the JCR and we are now stuck
> with the results.

Nope that this usually applies to direct child nodes, i.e. 50k nodes with
the same parent.

Such a number spread throughout the repository is not an issue.

Robert

>
>
> Best regards,
> Clay Ferguson
> wclayf@gmail.com
>
>
> On Fri, Nov 13, 2015 at 4:47 PM, Dirk Rudolph <dirk.rudolph@netcentric.biz
>
> wrote:
>
> > Did I understood you right, you have thousands of child nodes below the
> > root node?
> >
> > You should avoid this because this is considered bad practice in terms
of
> > write performance and depending on your concurrent access this might
also
> > block read access.
> >
> > http://wiki.apache.org/jackrabbit/Performance
> >
> > Try to introduce a structure to your content using BTreeManger
> >
> >
> >
> >
https://jackrabbit.apache.org/api/2.10/org/apache/jackrabbit/commons/flat/BTreeManager.html
> >
> > Cheers, D
> >
> >
> > On Friday, 13 November 2015, David Marginian <david@butterdev.com>
wrote:
> >
> > > Thanks Clay.  I am not trying to load that many records at once.  The
> > > application is crawling a directory.  It places the files from that
> > > directory into JackRabbit one at a time, and puts a content id onto a
> > queue
> > > which is picked up by consumers on different servers.  Those consumers
> > then
> > > use the content id to retrieve the file from JackRabbit. Each piece of
> > > content is saved in a node under the root node.  The performance
slowdown
> > > is coming from calling session.getRootNode(), from what I can gather
from
> > > the docs I need the root node in order to add a child node.  Note the
> > > slowdown is pretty significant and I don't need to have close to 50k
to
> > > start seeing it (I start seeing it within a few minutes of running my
> > > app).  I don't need orderable nodes, how do I disable that?
> > >
> > >
> > > On 11/13/2015 03:10 PM, Clay Ferguson wrote:
> > >
> > >> ​Please let us know more about your use case. Why are you even
"trying"
> > to
> > >> load that many records all at once. Or at least scan them one by
one, I
> > >> mean. In most use cases you wouldn't need to do this kind of thing,
> > unless
> > >> it's some kind of backup or replication. I say "most" cases... I'm
not
> > >>   saying you don't need to just asking for a bit more background.
BTW:
> > If
> > >> you don't need 'orderable' nodes try to avoid them. That type of node
> > does
> > >> not work at 'scale'... and 50K is propably pushing it.​
> > >>
> > >> Best regards,
> > >> Clay Ferguson
> > >> wclayf@gmail.com
> > >>
> > >>
> > >> On Fri, Nov 13, 2015 at 3:33 PM, <david@butterdev.com> wrote:
> > >>
> > >> Hi,
> > >>> I am new to JackRabbit and using version 2.11.2.  I am using
JackRabbit
> > >>> to
> > >>> store documents in a multi-threaded environment.  I noticed that the
> > time
> > >>> it takes to retrieve the root node is inconsistent and slow (several
> > >>> seconds +) and degrades over time (after 50K plus child nodes
retrieval
> > >>> is
> > >>> taking ~15 seconds).
> > >>>
> > >>> Originally, I was using code as follows to obtain a repository:
> > >>>
> > >>>   public Repository getRepository() throws ClassNotFoundException,
> > >>> RepositoryException {
> > >>>
> > >>>
> > >>>
> >
ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepositoryFactory"));
> > >>>       return JcrUtils.getRepository(jackabbitServerUrl);
> > >>>   }
> > >>>
> > >>> Then I came across the following thread:
> > >>>
> > >>>
> > >>>
> >
http://jackrabbit.510166.n4.nabble.com/getRootNode-takes-27-seconds-td1571027.html#a1571302
> > >>>
> > >>> This thread had some useful information (BatchReadConfig), but I am
not
> > >>> certain how to use the API to take advantage of it.  I have changed
my
> > >>> code
> > >>> to the following but it doesn't appear that node retrieval
performance
> > >>> has
> > >>> improved, is there something I am missing/doing wrong?
> > >>>
> > >>> 1) Repository Factory
> > >>> public Repository getRepository(@SuppressWarnings("rawtypes") Map
> > >>> parameters) throws RepositoryException {
> > >>>          String repositoryFactoryName = parameters != null &&
(
> > >>>
> > >>>  parameters.containsKey(PARAM_REPOSITORY_SERVICE_FACTORY) ||
> > >>>
> > parameters.containsKey(PARAM_REPOSITORY_CONFIG))
> > >>>                  ?
> > >>> "org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory"
> > >>>                  :
"org.apache.jackrabbit.core.RepositoryFactoryImpl";
> > >>>
> > >>>          Object repositoryFactory;
> > >>>          try {
> > >>>              Class<?> repositoryFactoryClass =
> > >>> Class.forName(repositoryFactoryName, true,
> > >>>
Thread.currentThread().getContextClassLoader());
> > >>>
> > >>>              repositoryFactory =
repositoryFactoryClass.newInstance();
> > >>>          }
> > >>>          catch (Exception e) {
> > >>>              throw new RepositoryException(e);
> > >>>          }
> > >>>
> > >>>          if (repositoryFactory instanceof RepositoryFactory) {
> > >>>              return ((RepositoryFactory)
> > >>> repositoryFactory).getRepository(parameters);
> > >>>          }
> > >>>          else {
> > >>>              throw new RepositoryException(repositoryFactory + " is
> > not a
> > >>> RepositoryFactory");
> > >>>          }
> > >>>      }
> > >>>
> > >>> 2) Use the factory to get a repo:
> > >>>   public Repository getRepository() throws ClassNotFoundException,
> > >>> RepositoryException {
> > >>>          Map<String, RepositoryConfig> parameters =
> > >>> Collections.singletonMap(
> > >>>                  "org.apache.jackrabbit.jcr2spi.RepositoryConfig",
> > >>>                  (RepositoryConfig) new
> > >>> RepositoryConfigImpl(jackabbitServerUrl));
> > >>>
> > >>>          return getRepository(parameters);
> > >>>      }
> > >>>
> > >>> 3) Repository Config:
> > >>> private static final class RepositoryConfigImpl implements
> > >>> RepositoryConfig {
> > >>>
> > >>>          private String jackabbitServerUrl;
> > >>>
> > >>>          private RepositoryConfigImpl(String jackabbitServerUrl) {
> > >>>              super();
> > >>>              this.jackabbitServerUrl = jackabbitServerUrl;
> > >>>          }
> > >>>
> > >>>          public CacheBehaviour getCacheBehaviour() {
> > >>>              return CacheBehaviour.INVALIDATE;
> > >>>          }
> > >>>
> > >>>          public int getItemCacheSize() {
> > >>>              return 100;
> > >>>          }
> > >>>
> > >>>          public int getPollTimeout() {
> > >>>              return 5000;
> > >>>          }
> > >>>
> > >>>          public RepositoryService getRepositoryService() throws
> > >>> RepositoryException {
> > >>>              BatchReadConfig brc = new BatchReadConfig() {
> > >>>                  public int getDepth(Path path, PathResolver
resolver)
> > >>> throws NamespaceException {
> > >>>                      return 1;
> > >>>                  }
> > >>>              };
> > >>>              return new RepositoryServiceImpl(jackabbitServerUrl,
brc);
> > >>>          }
> > >>>
> > >>>      }
> > >>>
> > >>> Thanks for your time.
> > >>>
> > >>> David
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >
> >
> > --
> >
> > Dirk Rudolph | Senior Software Engineer
> >
> > Netcentric AG
> >
> > M: +41 79 642 37 11
> > D: +49 174 966 84 34
> >
> > dirk.rudolph@netcentric.biz | www.netcentric.biz
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message