jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller" <thomas.tom.muel...@gmail.com>
Subject Re: NameFactoryImpl$NameImpl
Date Fri, 01 Feb 2008 09:52:51 GMT
Hi,

Would it be possible for you to create a simple, standalone test case
(that means, no external dependencies except Jackrabbit core, a single
class with a main method, similar to the FirstHop examples)? Best
would be if the memory problem can be reproduced using the standard
configuration; if not, could you also send the configuration you use?

Getting rid of the duplicate Strings would be fairly easy (using a
simple string cache), but we need to be sure we have a test case so we
know we solve the right problem.

Thanks,
Thomas


On Jan 23, 2008 5:51 PM, Andrey Adamovich <super_filin_aaa@yahoo.com> wrote:
> Hello guys!
>
> We have implemented JCR facade for our portal system based on JackRabbit. Facade delegates
it's calls to JackRabbit repository, and if data is not available, a request to a legacy CMS
is performed and data is inserted into JackRabbit.
>
> The structure of repository is similar to:
>
> /root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty
>
> Number of possible path values at each level is from 2 to 20.
> id and sub_id are unique identifiers of the document, under them document structure is
stored with maximum depth of 5. Many documents have the same type of structure and property
names.
>
> We faced some performance bottlenecks and I have tried to profile our application (with
YourKit Java Profiler) and I have noticed that there are many duplicate strings stored on
the heap and most of those duplicates (I mean almost all of them) are contained by org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl
class instances.
>
> After doing several portal page requests (which would mean about 1000 JR requests) and
taking memory snapshot I have noticed that string "root" is stored and contained by NameImpl
about 12000 times, which is about 2Mb waste. Also other strings with values of the repository
level names and property names had from 11000 to 3000 duplicates. The total calculated waste
is about 50Mb and that is only after not that many requests.
>
> It is probably not the only memory/performance bottleneck and it also could be that our
app is doing something wrong, but it would be good to get some ideas on that from you guys.
>
> After leaving server alone and not doing anything on that for a while (6-8 hours), I
have taken memory snapshot again and the number of duplicates has slightly reduced, but I
would not say that it changed a lot or many of the duplicate strings have been garbage-collected.
>
> I have also looked at the source of NameFactoryImpl$NameImpl and found that it uses String.intern()
for name space storing, but not for local name part, which is wise in general, but may not
work if JackRabbit is stressed to have too many requests.
>
> Therefore I have several questions, that some of you may help me with:
>
> 1) Is there a way to implement different name creation strategy? I see that NameFactory
is an interface, but how would I plug in different implementation to adapt to my repository
structure, so, that "root" string would not be stored 12000 times or even more?
>
> 2) Can someone explain me how JR cache manager works and can this leak happen because
of cache manager storing to many states? Is the size of JR cache depends on the live session
number? Would it be wise to disable it? or at least limit it?
>
> Best regards,
>
> Andrey
>
>
>
>       ___________________________________________________________
> Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/

Mime
View raw message