lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Date Mon, 24 Mar 2008 08:54:51 GMT

Ivan can you describe more about your application?

The overall time for indexing has gotten much faster in 2.3, but this  
is assuming things like retrieving a document from its original  
source, filtering it, etc, are minimal.  If you have an application  
where most of the time is spent outside Lucene then the 2.3 speedups  
won't result in a very large speedup for your application.

Mike

Uwe Goetzke wrote:
> Hi Ivan,
> No, we do not use StandardAnalyser or StandardTokenizer.
>
> Most data is processed by
> 	fTextTokenStream = result = new  
> org.apache.lucene.analysis.WhitespaceTokenizer(reader);
> 	result = new ISOLatin2AccentFilter(result); //  
> ISOLatin1AccentFilter  modified that ö -> oe
> 	result = new org.apache.lucene.analysis.LowerCaseFilter(result);
> 	result = new org.apache.lucene.analysis.NGramStemFilter(result, 
> 2); //just a bigram tokenizer
>
> We use our own queryparser. The bigramms are searched with a  
> tolerant phrase query, scoring in a doc the greatest bigramms  
> clusters covering the phrase token.
>
> Best Regards
>
> Uwe
>
> -----Ursprüngliche Nachricht-----
> Von: Ivan Vasilev [mailto:ivasilev@sirma.bg]
> Gesendet: Freitag, 21. März 2008 16:25
> An: java-user@lucene.apache.org
> Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
>
> Hi Uwe,
>
> Could you tell what Analyzer do you use when you marked so big  
> indexing
> speedup?
> If you use StandardAnalyzer (that uses StandardTokenizer) may be the
> reason is in it. You can see the pre last report in the thread  
> "Indexing
> Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter  
> Jake
> Mannix this is because now StandardTokenizer uses  
> StandardTokenizerImpl
> that now is generated by JFlex instead of JavaCC.
> I am asking because I noticed a great speedup in adding documents to
> index in our system. We have time control on this in the debug  
> mode. NOW
> THEY ARE ADDED 5 TIMES FASTER!!!
> But in the same time the total process of indexing in our case has
> improvement of about 8%. As our system is very big and complex I am
> wondering if really the whole process of indexing is reduces so
> remarkably and our system causes this slowdown or may be Lucene does
> some optimizations on the index, merges or something else and this is
> the reason the total process of indexing to be not so reasonably  
> faster.
>
> Best Regards,
> Ivan
>
>
>
> Uwe Goetzke wrote:
>> This week I switched the lucene library version on one customer  
>> system.
>> The indexing speed went down from 46m32s to 16m20s for the  
>> complete task
>> including optimisation. Great Job!
>> We index product catalogs from several suppliers, in this case around
>> 56.000 product groups and 360.000 products including descriptions  
>> were
>> indexed.
>>
>> Regards
>>
>> Uwe
>>
>>
>>
>> --------------------------------------------------------------------- 
>> --
>> Healy Hudson GmbH - D-55252 Mainz Kastel
>> Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076
>>
>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte  
>> Empfanger sind, durfen Sie die Informationen nicht offen legen  
>> oder benutzen. Wenn Sie diese Email durch einen Fehler bekommen  
>> haben, teilen Sie uns dies bitte umgehend mit, indem Sie diese  
>> Email an den Absender zuruckschicken. Bitte loschen Sie danach  
>> diese Email.
>> This email is confidential. If you are not the intended recipient,  
>> you must not disclose or use this information contained in it. If  
>> you have received this email in error please tell us immediately  
>> by return email and delete the document.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> __________ NOD32 2913 (20080301) Information __________
>>
>> This message was checked by NOD32 antivirus system.
>> http://www.eset.com
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------- 
> -
> Healy Hudson GmbH - D-55252 Mainz Kastel
> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
>
> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte  
> Empfänger sind, dürfen Sie die Informationen nicht offen legen oder  
> benutzen. Wenn Sie diese Email durch einen Fehler bekommen haben,  
> teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an  
> den Absender zurückschicken. Bitte löschen Sie danach diese Email.
> This email is confidential. If you are not the intended recipient,  
> you must not disclose or use this information contained in it. If  
> you have received this email in error please tell us immediately by  
> return email and delete the document.
>
>
> ---------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message