jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajai <ajaik...@gmail.com>
Subject Re: How to do Indexing and Extraction in Background threads
Date Thu, 06 Aug 2009 05:21:09 GMT

Please someone help me on this, i need to resolve this urgently.

Thanks
Ajai G


Ajai wrote:
> 
> Also giving the background information
> 
> 	I have uploaded 25000 folders each with 15 documents (3,75,000 documents)
> in a MS-SQL Server 2005. After that we added 2.5 MB pdf document it took
> around 8 seconds.
> 
> 	We profiled the process and noticed that major time was spent on text
> extraction in PDFBOX. Also the http thread waited till the extraction
> thread completion.
>       
> Thanks
> Ajai
> 
> 
> Ajai wrote:
>> 
>> We are using 1.5
>> 
>> Thanks
>> Ajai
>> 
>> Marcel Reutegger wrote:
>>> 
>>> that looks OK to me. what version of jackrabbit are you using?
>>> 
>>> regards
>>>  marcel
>>> 
>>> On Wed, Aug 5, 2009 at 12:18, Ajai<ajaiking@gmail.com> wrote:
>>>>
>>>> Also attaching the configuration as a text file
>>>> http://www.nabble.com/file/p24824270/config.txt config.txt
>>>>
>>>>
>>>>
>>>> Ajai wrote:
>>>>>
>>>>> Thanks marcel for the response.
>>>>> Please find below the configuration:
>>>>>
>>>>> <SearchIndex
>>>>> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   </SearchIndex>
>>>>>
>>>>> Kindly let us know your thoughts
>>>>>
>>>>> Thanks,
>>>>> Ajai G
>>>>>
>>>>>
>>>>>
>>>>> Marcel Reutegger wrote:
>>>>>>
>>>>>> can you please send the configuration again in plain text. the
>>>>>> configuration didn't make it through.
>>>>>>
>>>>>> but in any case, you can set the parameter extractorPoolSize to the
>>>>>> number of background threads that you want to give the text
>>>>>> extraction
>>>>>> process. see also: http://wiki.apache.org/jackrabbit/Search
>>>>>>
>>>>>> regards
>>>>>>  marcel
>>>>>>
>>>>>> On Wed, Aug 5, 2009 at 11:22, Ajai<ajaiking@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Whenever we add a document to the repository, the indexing and
>>>>>>> extraction
>>>>>>> seems to happen in the same thread. Due to this, the addition
takes
>>>>>>> around 8
>>>>>>> secs for a 2.5 MB document.
>>>>>>>
>>>>>>> We would like to make this extraction and indexing to be done
on a
>>>>>>> background thread.
>>>>>>>
>>>>>>> I have the following configuration for searchIndex in the
>>>>>>> repository.xml
>>>>>>>
>>>>>>> <SearchIndex
>>>>>>>
>>>>>>>  class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                </SearchIndex>
>>>>>>>
>>>>>>> Please let us know if any configuraion changes needs to be made.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ajai G
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24823548.html
>>>>>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24824270.html
>>>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>>>
>>>>
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-do-Indexing-and-Extraction-in-Background-threads-tp24823548p24840415.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.


Mime
View raw message