jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajai <ajaik...@gmail.com>
Subject Re: Performance of Jackrabbit
Date Tue, 28 Jul 2009 16:07:23 GMT

Hi Team,

Thanks for the responses.

I was able to upload 25000 folders each with 15 documents in a derby
database.

When i tried to add a new document to one of these folders, It is taking a
lot of time to do this addition of new document. The document size that i
used is 2.5 MB pdf document.

I used profiler to look into this issue, It seems PDFbox is taking a lot of
time.
Also i had set "indexMergerPoolSize" parameter to 50, "extractorPoolSize"
parameter to 50.

Can you help me to resolve this problem.

Thanks 
Ajai G



Stefan Guggisberg wrote:
> 
> On Mon, Jul 27, 2009 at 4:36 PM, Ajai<ajaiking@gmail.com> wrote:
>>
>> Actually i am doing the right way as you mentioned, having session.save()
>> after each file.
>> But i do have text extractors and indexes turned on.
>> My Configuration:
>>
>> for searchindex:
>>
>> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>                </SearchIndex>
>>
>>
>> My Index config:
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE configuration SYSTEM
>> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
>>        xmlns:jcr="http://www.jcp.org/jcr/1.0">
>>        <index-rule nodeType="nt:file">
>>                <property>jcr:content</property>
>>        </index-rule>
>>        <index-rule nodeType="nt:resource">
>>                <property>jcr:data</property>
>>        </index-rule>
>> </configuration>
>>
>> Kindly tell me the optimal way to use them.
> 
> as already suggested in my earlier post:
> 
> 1. disable search index or text extractors and compare results
> 2. remove checkin() call and compare results
> 3. use embedded derby and compare results
> 4. if you provide GenRandom.java, i'll run the test on my own machine.
> 
> cheers
> stefan
> 
>>
>>
>> Thanks
>> Ajai G
>>
>>
>>
>> Guo Du wrote:
>>>
>>> On Mon, Jul 27, 2009 at 2:56 PM, Ajai<ajaiking@gmail.com> wrote:
>>>>
>>>> Hi Guo,
>>>>
>>>> Yes, i am adding a document to the repository.
>>>> Is there multiple ways to do a save?
>>>>
>>>> I am doing it the following way,
>>>>
>>>> fileNode = matterNode.addNode(fileName, "nt:file");
>>>> fileNode.addMixin("mix:versionable");
>>>> fileNode.addMixin("mix:referenceable");
>>>> Node resNode = fileNode.addNode("jcr:content", "nt:resource");
>>>> resNode.addMixin("mix:versionable");
>>>> resNode.addMixin("mix:referenceable");
>>>> resNode.setProperty("jcr:mimeType", mimeType);
>>>> resNode.setProperty("jcr:encoding", ENCODING_UTF_8);
>>>> resNode.setProperty("jcr:data", new FileInputStream(file));
>>>> Calendar lastModified = Calendar.getInstance();
>>>> lastModified.setTimeInMillis(file.lastModified());
>>>> resNode.setProperty("jcr:lastModified", lastModified);
>>>> // finally
>>>> session.save();
>>>>
>>>> Please suggest if any changes can be done.
>>>>
>>>
>>>
>>> Your code doesn't show details of the loop.
>>>
>>>
>>> WRONG
>>> ==============
>>> loop{ // 375000 times
>>>   addNode(...)
>>> }
>>> session.save();
>>> ==============
>>>
>>>
>>>
>>> CORRECT
>>> ==============
>>> loop{ // 375000 times
>>>   addNode(...)
>>>   session.save();
>>> }
>>> ==============
>>> You may also add multiple documents before call session.save() to take
>>> advantage of batch process more efficiently. But not after add all
>>> 375000 documents.
>>>
>>> --Guo
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Performance-of-Jackrabbit-tp24619853p24681862.html
>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Performance-of-Jackrabbit-tp24619853p24702639.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.


Mime
View raw message