Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@jackrabbit.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAN24SXm3p0M2ShzJOJ6Z+TT6NVNzQJm+ziozXoVT+BH-wxJ37A@mail.gmail.com>
References: <a6679a7f42e3285f9f6e9819ff8e0fa6@butterdev.com>
	<CAN24SX=NjF1BCsWPZ08Bj47fpA=Mt_9eo4HdU1CLb9Lsh=g-Ow@mail.gmail.com>
	<5646657A.4080502@butterdev.com>
	<CADqNnVo8MbyMkGhkGGhL895FDS3Eg6oiki9bkAYHEQmm4QvaVA@mail.gmail.com>
	<CAN24SXm3p0M2ShzJOJ6Z+TT6NVNzQJm+ziozXoVT+BH-wxJ37A@mail.gmail.com>
Date: Sat, 14 Nov 2015 10:02:39 +0200
Message-ID: 
 <CAC8ULPZZBjsUN4Am5DPzGDoygRSGQFjMgu6Hso4w8mmy2pcf1w@mail.gmail.com>
Subject: Re: Node Retrieval Performance
From: Robert Munteanu <rombert@apache.org>
To: users@jackrabbit.apache.org
Content-Type: multipart/alternative; boundary=047d7b3442069703ac05247b9973

--047d7b3442069703ac05247b9973
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Nov 14, 2015 2:21 AM, "Clay Ferguson" <wclayf@gmail.com> wrote:
>
> In my opinion this one issue is the single most crippling achilies heel o=
f
> the entire JCR. Very likely to drive away many potential users of this
API.
> It's touted as an enterprise-scale API, but yet chokes on just a few tens
> of thousands of nodes. This, IMO urgently needs to be addressed. I know
> it's a technical limitation, and not a design decision, but to me that
just
> means it's an 'unsolved' problem. I'm not complaining or criticizing
> developers, i'm just saying that as a community we need to solve this. I
> should be able to have a 50 million nodes, and not be a problem, in an
> ideal situation. RDBMS have solved these issues years ago, by a "never
load
> everything all at once" rule. However somehow the "It's ok to load all
> children in memory" mentality caught on in the JCR and we are now stuck
> with the results.

Nope that this usually applies to direct child nodes, i.e. 50k nodes with
the same parent.

Such a number spread throughout the repository is not an issue.

Robert

>
>
> Best regards,
> Clay Ferguson
> wclayf@gmail.com
>
>
> On Fri, Nov 13, 2015 at 4:47 PM, Dirk Rudolph <dirk.rudolph@netcentric.bi=
z
>
> wrote:
>
> > Did I understood you right, you have thousands of child nodes below the
> > root node?
> >
> > You should avoid this because this is considered bad practice in terms
of
> > write performance and depending on your concurrent access this might
also
> > block read access.
> >
> > http://wiki.apache.org/jackrabbit/Performance
> >
> > Try to introduce a structure to your content using BTreeManger
> >
> >
> >
> >
https://jackrabbit.apache.org/api/2.10/org/apache/jackrabbit/commons/flat/B=
TreeManager.html
> >
> > Cheers, D
> >
> >
> > On Friday, 13 November 2015, David Marginian <david@butterdev.com>
wrote:
> >
> > > Thanks Clay.  I am not trying to load that many records at once.  The
> > > application is crawling a directory.  It places the files from that
> > > directory into JackRabbit one at a time, and puts a content id onto a
> > queue
> > > which is picked up by consumers on different servers.  Those consumer=
s
> > then
> > > use the content id to retrieve the file from JackRabbit. Each piece o=
f
> > > content is saved in a node under the root node.  The performance
slowdown
> > > is coming from calling session.getRootNode(), from what I can gather
from
> > > the docs I need the root node in order to add a child node.  Note the
> > > slowdown is pretty significant and I don't need to have close to 50k
to
> > > start seeing it (I start seeing it within a few minutes of running my
> > > app).  I don't need orderable nodes, how do I disable that?
> > >
> > >
> > > On 11/13/2015 03:10 PM, Clay Ferguson wrote:
> > >
> > >> =E2=80=8BPlease let us know more about your use case. Why are you ev=
en
"trying"
> > to
> > >> load that many records all at once. Or at least scan them one by
one, I
> > >> mean. In most use cases you wouldn't need to do this kind of thing,
> > unless
> > >> it's some kind of backup or replication. I say "most" cases... I'm
not
> > >>   saying you don't need to just asking for a bit more background.
BTW:
> > If
> > >> you don't need 'orderable' nodes try to avoid them. That type of nod=
e
> > does
> > >> not work at 'scale'... and 50K is propably pushing it.=E2=80=8B
> > >>
> > >> Best regards,
> > >> Clay Ferguson
> > >> wclayf@gmail.com
> > >>
> > >>
> > >> On Fri, Nov 13, 2015 at 3:33 PM, <david@butterdev.com> wrote:
> > >>
> > >> Hi,
> > >>> I am new to JackRabbit and using version 2.11.2.  I am using
JackRabbit
> > >>> to
> > >>> store documents in a multi-threaded environment.  I noticed that th=
e
> > time
> > >>> it takes to retrieve the root node is inconsistent and slow (severa=
l
> > >>> seconds +) and degrades over time (after 50K plus child nodes
retrieval
> > >>> is
> > >>> taking ~15 seconds).
> > >>>
> > >>> Originally, I was using code as follows to obtain a repository:
> > >>>
> > >>>   public Repository getRepository() throws ClassNotFoundException,
> > >>> RepositoryException {
> > >>>
> > >>>
> > >>>
> >
ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepo=
sitoryFactory"));
> > >>>       return JcrUtils.getRepository(jackabbitServerUrl);
> > >>>   }
> > >>>
> > >>> Then I came across the following thread:
> > >>>
> > >>>
> > >>>
> >
http://jackrabbit.510166.n4.nabble.com/getRootNode-takes-27-seconds-td15710=
27.html#a1571302
> > >>>
> > >>> This thread had some useful information (BatchReadConfig), but I am
not
> > >>> certain how to use the API to take advantage of it.  I have changed
my
> > >>> code
> > >>> to the following but it doesn't appear that node retrieval
performance
> > >>> has
> > >>> improved, is there something I am missing/doing wrong?
> > >>>
> > >>> 1) Repository Factory
> > >>> public Repository getRepository(@SuppressWarnings("rawtypes") Map
> > >>> parameters) throws RepositoryException {
> > >>>          String repositoryFactoryName =3D parameters !=3D null && (
> > >>>
> > >>>  parameters.containsKey(PARAM_REPOSITORY_SERVICE_FACTORY) ||
> > >>>
> > parameters.containsKey(PARAM_REPOSITORY_CONFIG))
> > >>>                  ?
> > >>> "org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory"
> > >>>                  :
"org.apache.jackrabbit.core.RepositoryFactoryImpl";
> > >>>
> > >>>          Object repositoryFactory;
> > >>>          try {
> > >>>              Class<?> repositoryFactoryClass =3D
> > >>> Class.forName(repositoryFactoryName, true,
> > >>>
Thread.currentThread().getContextClassLoader());
> > >>>
> > >>>              repositoryFactory =3D
repositoryFactoryClass.newInstance();
> > >>>          }
> > >>>          catch (Exception e) {
> > >>>              throw new RepositoryException(e);
> > >>>          }
> > >>>
> > >>>          if (repositoryFactory instanceof RepositoryFactory) {
> > >>>              return ((RepositoryFactory)
> > >>> repositoryFactory).getRepository(parameters);
> > >>>          }
> > >>>          else {
> > >>>              throw new RepositoryException(repositoryFactory + " is
> > not a
> > >>> RepositoryFactory");
> > >>>          }
> > >>>      }
> > >>>
> > >>> 2) Use the factory to get a repo:
> > >>>   public Repository getRepository() throws ClassNotFoundException,
> > >>> RepositoryException {
> > >>>          Map<String, RepositoryConfig> parameters =3D
> > >>> Collections.singletonMap(
> > >>>                  "org.apache.jackrabbit.jcr2spi.RepositoryConfig",
> > >>>                  (RepositoryConfig) new
> > >>> RepositoryConfigImpl(jackabbitServerUrl));
> > >>>
> > >>>          return getRepository(parameters);
> > >>>      }
> > >>>
> > >>> 3) Repository Config:
> > >>> private static final class RepositoryConfigImpl implements
> > >>> RepositoryConfig {
> > >>>
> > >>>          private String jackabbitServerUrl;
> > >>>
> > >>>          private RepositoryConfigImpl(String jackabbitServerUrl) {
> > >>>              super();
> > >>>              this.jackabbitServerUrl =3D jackabbitServerUrl;
> > >>>          }
> > >>>
> > >>>          public CacheBehaviour getCacheBehaviour() {
> > >>>              return CacheBehaviour.INVALIDATE;
> > >>>          }
> > >>>
> > >>>          public int getItemCacheSize() {
> > >>>              return 100;
> > >>>          }
> > >>>
> > >>>          public int getPollTimeout() {
> > >>>              return 5000;
> > >>>          }
> > >>>
> > >>>          public RepositoryService getRepositoryService() throws
> > >>> RepositoryException {
> > >>>              BatchReadConfig brc =3D new BatchReadConfig() {
> > >>>                  public int getDepth(Path path, PathResolver
resolver)
> > >>> throws NamespaceException {
> > >>>                      return 1;
> > >>>                  }
> > >>>              };
> > >>>              return new RepositoryServiceImpl(jackabbitServerUrl,
brc);
> > >>>          }
> > >>>
> > >>>      }
> > >>>
> > >>> Thanks for your time.
> > >>>
> > >>> David
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >
> >
> > --
> >
> > Dirk Rudolph | Senior Software Engineer
> >
> > Netcentric AG
> >
> > M: +41 79 642 37 11
> > D: +49 174 966 84 34
> >
> > dirk.rudolph@netcentric.biz | www.netcentric.biz
> >

--047d7b3442069703ac05247b9973--