lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Batch Indexing - best practice?
Date Mon, 15 Mar 2010 15:09:23 GMT
Really depends - StandardAnalyzer is probably a slower analyzer. But for 
example, with my quad core desktop machine, indexing with 3 or 4 
threads, I can do at least a couple hundred wikipedia docs per second 
(though I'm not using StandardAnalyzer). I'm indexing 10,000 docs in 
about a minute.

Does depend on your analzyers, and the size of the docs, but that looks 
pretty slow to me. Giving the JVM enough heap?

On 03/15/2010 11:02 AM, Murdoch, Paul wrote:
> Thanks.  I'll try lowering the merge factor and see if speed increases.
> The indexing is threaded....similar to the utility class in Listing 10.1
> from Lucene in Action.  Search speed is great once the index is
> built....close to real time.  So my main problem is getting the indexing
> speed fixed.  I do use the StandardAnalyzer for most of my fields.  What
> type of performance level should I be trying to hit for indexing
> (docs/sec)...just to give me an idea of what to shoot for?
>
> Paul
>
> -----Original Message-----
> From: java-user-return-45433-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> [mailto:java-user-return-45433-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> ] On Behalf Of Mark Miller
> Sent: Monday, March 15, 2010 10:48 AM
> To: java-user@lucene.apache.org
> Subject: Re: Batch Indexing - best practice?
>
> On 03/15/2010 10:41 AM, Murdoch, Paul wrote:
>    
>> Hi,
>>
>>
>>
>> I'm using Lucene 2.9.2.  Currently, when creating my index, I'm
>>      
> calling
>    
>> indexWriter.addDocument(doc) for each Document I want to index.  The
>> Documents aren't large and I'm averaging indexing about 500 documents
>> every 90 seconds.  I'd like to try and speed this up....unless 90
>> seconds for 500 Documents is reasonable.  I have the merge factor set
>>      
> to
>    
>> 1000.  Do you have any suggestions for batch indexing?  Is there
>> something like indexWriter.addDocuments(Document[] docs) in the API?
>>
>>
>>
>> Thanks.
>>
>> Paul
>>
>>
>>
>>
>>
>>      
> You should lower that merge factor - thats *really* high.
>
> You shouldn't really need much more than 50 or so ... and for search
> speed your going to want fewer segments anyway -
> if your just going to end up optimizing at the end, there is no reason
> for such a large merge factor - you will pay for most of what
> you saved when you optimize.
>
> That is very slow by the way. Should be much faster - especially if you
> are using multiple threads.
>
>    


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message