lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wouter Heijke" <whei...@xs4all.nl>
Subject Re: Improving indexing speed
Date Thu, 17 Nov 2011 15:09:50 GMT
Hi,
We faced a similar problem.
The solution was to give the indexer less work and let worker threads do
all the work. They would result in pre-processed/analyzed/tokenized
Documents that could be indexed by the writer without any processing.

Wouter

> Hi
>
> the file to be indexed depends on the type of Document / data extractor
> ....
>
> My Document types are usually XML type and   every  time 2+ Million XML's
> are indexed and time taken is less then 5 minuts.
>
>
>
>
> with regards
> karthik
>
> On Fri, Nov 11, 2011 at 1:17 AM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> And how long does it take just to read and parse the files, without
>> indexing them?  Often that is the problem - nothing to do with lucene.
>>
>> There is plenty of good advice in
>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.  A good match
>> on the subject of your message!
>>
>> --
>> Ian.
>>
>>
>> On Thu, Nov 10, 2011 at 7:22 PM, Simon Willnauer
>> <simon.willnauer@googlemail.com> wrote:
>> > can you provide more information about your setup? things like how
>> > much time does it take to index you documents, how many docs do you
>> > index, what are your index writer settings, how many cores do you
>> > have, where do you read from and write to (disks). oh and what version
>> > of lucene are you using?
>> >
>> > thanks,
>> >
>> > simon
>> >
>> > On Thu, Nov 10, 2011 at 10:40 AM, antony jospeh
>> > <antony.joseph.webmail@gmail.com> wrote:
>> >> Hi all,
>> >>
>> >> I have a large number of files in a directory need to be index them.
>> All
>> >> the files are in specific format need to parse to extract information
>> after
>> >> that i had to index.
>> >> Single thread process one file at a time then i decided to use multi
>> >> threads when the main thread that loops the directory and pass the
>> file
>> >> into pool of worker threads using a queue
>> >> all of the which share same index writer, How ever there is no any
>> >> significant changes in indexing speed
>> >>
>> >> Any hints I am doing wrong or any suggestion
>> >>
>> >>
>> >> Thanks
>> >> Antony
>> >>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message