jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Peltier <npelt...@adobe.com>
Subject Re: Rebuilding index
Date Fri, 23 Nov 2012 07:51:55 GMT

There's not a lot of reason for which the index becomes inconsistent
(mainly brutal stop of the server), and there are ways to fix
inconsistencies (that take time as well). If your search is "well defined"
(i.e. You know that you are/will be searching only for certain
nodes/properties), a simpler way to go is with indexConfiguration
(configuring index for only those nodes/properties).

On 11/22/12 7:53 PM, "Nelson Takashi Omori" <nelson.omori@murah.com.br>

>Hi All,
>I'm using Jackrabbit 2.4.3 and my repository has approximately 110
>thousand nodes. From these, about 10 thousand nodes has binary values,
>wich the content need to be extracted, using Tika, and indexed in Lucene.
>I decided to delete the index to make Jackrabbit create them again. The
>problem is the time that this operation is taking. I waited for 3 hours
>and the repository wasn't initialized (I don't know exactly how long it
>take to complete the repository initialization, because I stopped the
>process). Disabling Tika's text extraction, it took 5 minutes, so I
>concluded that the problem is the time that Tika takes to extract the 10
>thousand documents.
>If the index become inconsistent and I have to execute the rebuild, my
>client doesn't want to wait for more than 3 hours to start using the
>system. So I'm planning to create a subclass of
>org.apache.jackrabbit.core.query.lucene.SearchIndex and try to modify
>how the indexes are re-created. To give to my client a fast access to
>the repository, first I'll ignore the text extraction and create the
>index with normal properties. With this structure, I can give access to
>the repository to my client and he can do many things using only the
>normal properties. So, in background, I'll start the text extraction of
>each document and update Lucene's document with extracted value.
>I have some questions about it.
>1) Reading the source code, jackrabbit is using LazyTextExtractorField
>(and other classes) to execute the extraction in a separate thread.
>Doesn't it do exactly what I want? But, even so I waited 3 hours and the
>repository wasn't initialized and ready to use. Is it normal?
>2)  What I'm planning to do is the best approach? Did anybody make
>something similar?

View raw message