jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandru Popescu" <the.mindstorm.mailingl...@gmail.com>
Subject Re: possible performance problem (need a way to test it)
Date Tue, 19 Sep 2006 07:28:00 GMT
On 9/19/06, Tobias Bocanegra <tobias.bocanegra@day.com> wrote:
> hi, what is you document order configuration? this can speedup your
> queries if turned off.
>
>    <SearchIndex>
>        .....
>        <param name="respectDocumentOrder" value="false"/>
>    </SearchIndex>
>

Hi Toby!

I knew this trick, but as I showed in a previous email the query (and
in general all my queries) are using order clause, and as far as I
know this parameter will change nothing in these cases.

However, just to be sure, I have rerun the test with this parameter
set, and I am getting quite the same result (so my understanding is
correct).

./alex
--
.w( the_mindstorm )p.

> regards, toby
>
> btw: the problem on concurrent querying is, that large blocks of
> lucene need to e synchronized and sequencialize your parallel tasks.
>
>
> On 9/19/06, Alexandru Popescu <the.mindstorm.mailinglist@gmail.com> wrote:
> > I definitely don't know Jackrabbit source base so well to comment, nor
> > the Lucene implementation, but at the first glance everything seems
> > oke. So, it looks like it is a limitation of Lucene in highly
> > concurrent environments. Maybe somebody with better knowledge can
> > comment on this.
> >
> > I have just a small remark about the code in QueryImpl.execute method:
> > it looks like the ACLs are double checked: once in this method when
> > reading for the 1st time the UUIDs and a 2nd time in the iterator when
> > fetching the node. However, I don't think this is something that has
> > any impact on the performance.
> >
> > I am looking forward for any comments, opinions, advise.
> >
> > ./alex
> > --
> > :Architect of InfoQ.com:
> >  .w( the_mindstorm )p.
> > Co-founder of InfoQ.com
> >
> >
> > On 9/19/06, Alexandru Popescu <the.mindstorm.mailinglist@gmail.com> wrote:
> > > Oke, I am now able to reproduce the problem using an environment as
> > > close to the real one and with a TestNG test that runs the same
> > > invocation in parallel threads for a hundred of times.
> > >
> > > [code]
> > >     @Test(invocationCount=100, threadPoolSize=50)
> > >     public void fetchRssNews() {
> > >         m_rssContentDao.findOrderedContentList(0, 15, new Filtering());
> > >     }
> > > [/code]
> > >
> > > The test was used just to be able to profile the code, and the results
> > > I am getting are the following: the most time is spent in the
> > > following two calls:
> > >
> > > o.a.j.c.query.lucene.QueryHits.doc(int) ->
> > > o.a.lucene.search.Hits.doc(int) (8.046ms from 11.468ms)
> > >
> > > o.a.j.c.query.lucene.SearchIndex.executeQuery(QueryImpl, Query,
> > > QName[], boolean[]) -> o.a.lucene.search.Searcher.search(Query, Sort)
> > > (1.390ms from 11.468ms)
> > >
> > > The executed query is:
> > >
> > > /jcr:root/news/element(*,cmed:translatable)/* order by @cmed:timestamp
> > > descending
> > >
> > > and there are about 200 nodes under /news.
> > >
> > > Do you think there is something I can do to optimize this behavior
> > > before jumping to caching?
> > >
> > > I am getting the impression that if I would read node by node and
> > > check the properties by myself I could get better performance so I
> > > really think there is something I can do.
> > >
> > > Any help is highly appreciated,
> > >
> > > ./alex
> > > --
> > > :Architect of InfoQ.com:
> > >  .w( the_mindstorm )p.
> > > Co-founder of InfoQ.com
> > >
> > >
> > > > > > This was the first part. Now about the real problem I am seeing:
when
> > > > > > accessing the JCR repo from multiple concurrent threads (each
using
> > > > > > its own Session) and we perform querying we see a huge CPU load
and
> > > > > > the response times are growing very fast:
> > > > > > - for 5 concurrent threads the query reponse times are around
200-500
> > > > > > ms; server load about 0.65-0.7
> > > > > > - for 100 concurrent threads the query response times are around
> > > > > > 150000-200000 ms; server load about 7-7.5
> > > > > >
> > > > > > As you can see these are very dangerous numbers, and I would
> > > > > > definitely like to figure out what is the problem behind them,
because
> > > > > > in my application I can expect something around 300 concurrent
threads
> > > > > > access.
> > > > > >
> > > > > > I know I can start looking at different options like caching
and
> > > > > > similar ideas, but firstly understanding the real problem will
help me
> > > > > > a lot.
> > > > > >
> > > > > > Many thanks for any helping ideas and comments,
> > > > > >
> > > > > > ./alex
> > > > > > --
> > > > > > :Architect of InfoQ.com:
> > > > > >  .w( the_mindstorm )p.
> > > > > > Co-founder of InfoQ.com
> > > > > >
> > >
> >
>
>
> --
> -----------------------------------------< tobias.bocanegra@day.com >---
> Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
> T +41 61 226 98 98, F +41 61 226 98 97
> -----------------------------------------------< http://www.day.com >---
>

Mime
View raw message