jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nelson Takashi Omori <nelson.om...@murah.com.br>
Subject Re: AW: Rebuilding index
Date Fri, 23 Nov 2012 20:33:03 GMT
Thank you, Claus.

I'll try to configure a cluster and see how it works.

Another thing on the process of rebuilding the index, is that my 
computer's CPU usage was constantly on 16% and access to the HD and 
memory usage was low too. So my computer's resources weren't used 
completely. So I executed on debug mode and I saw this message, many times:
"Executor is under load, will schedule 1987 remaining tasks for 50 ms later"

Searching deeply, I found that Jackrabbit creates a text extraction task 
as a low priority task. The execution of this kind of task is controlled 
by the value "maxLoadForLowPriorityTasks" in the JackrabbitThreadPool, 
wich is defined by the value from a system parameter 
"org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks". 
If this value doesn't exist or it's not between 0 and 100, Jackrabbit 
uses 75 by default. This value is used to determine if it's possible to 
execute a low priority task, checking the number of threads that are 
active in the moment. Using default value, if more than 75% of threads 
are in use, the task will be scheduled for later.

So I set the parameter 
"org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks" 
to "0" and Jackrabbit ignores the verification and the process was 
faster, about 2hours to complete the rebuild. The CPU usage was floating 
from 50% to 90%, memory was used up to the limit and the HD was accessed 
more constantly. Maybe it's better to increase the memory allocated 
before you execute this.

In my scenario, it make sense to set this value to "0", because while 
the rebuild process is executing, my client can't use the system, so I 
can use all the resources that I have to finish as soon as possible. 
After the rebuild process, you should remove the parameter, so 
Jackrabbit can control the execution of low priority task again.

Maybe this can help someone who have to rebuild the index as soon as 
possible and don't have a cluster mentioned by Clauss.

Em 23/11/2012 06:29, KÖLL Claus escreveu:
> Hi Nelson,
>
>> 1) Reading the source code, jackrabbit is using LazyTextExtractorField (and other
classes) to execute the extraction in a separate thread.
>> Doesn't it do exactly what I want? But, even so I waited 3 hours and the repository
wasn't initialized and ready to use. Is it normal?
> First .. yes this is normal ..
> and yes you are right about extraction in a separate thread .. this happens on session.save()
operation. If you start the repository it will start to re-index it if the index is not present.
> In that way jackrabbit does not separate between full text indexing and "normal" node/property
indexing. So the start will take much time
> depending on your content.
>
>> 2)  What I'm planning to do is the best approach? Did anybody make something similar?
> One way to handle such index recovering is to create a cluster. Let's assume you would
have 2 cluster members where one is the primary and the other one is a hot standby member.
> If you have problems with the index on the primary cluster member you could copy the
index folder from the standby cluster member.
> If you like you could re-index the repository on your standby member while the primary
is running.
>
> greets
> claus
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message