lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Vasilev <ivasi...@sirma.bg>
Subject Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Date Mon, 24 Mar 2008 09:33:09 GMT
Yes Michael it seems our app takes time to retrieve docs from their 
sources. I have to run some profiler tool to see where is the bottleneck 
in our case.
Thanks to you and Uwe for the answers!

Ivan

Michael McCandless wrote:
>
> Ivan can you describe more about your application?
>
> The overall time for indexing has gotten much faster in 2.3, but this 
> is assuming things like retrieving a document from its original 
> source, filtering it, etc, are minimal.  If you have an application 
> where most of the time is spent outside Lucene then the 2.3 speedups 
> won't result in a very large speedup for your application.
>
> Mike
>
> Uwe Goetzke wrote:
>> Hi Ivan,
>> No, we do not use StandardAnalyser or StandardTokenizer.
>>
>> Most data is processed by
>>     fTextTokenStream = result = new 
>> org.apache.lucene.analysis.WhitespaceTokenizer(reader);
>>     result = new ISOLatin2AccentFilter(result); // 
>> ISOLatin1AccentFilter  modified that ö -> oe
>>     result = new org.apache.lucene.analysis.LowerCaseFilter(result);
>>     result = new 
>> org.apache.lucene.analysis.NGramStemFilter(result,2); //just a bigram 
>> tokenizer
>>
>> We use our own queryparser. The bigramms are searched with a tolerant 
>> phrase query, scoring in a doc the greatest bigramms clusters 
>> covering the phrase token.
>>
>> Best Regards
>>
>> Uwe
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Ivan Vasilev [mailto:ivasilev@sirma.bg]
>> Gesendet: Freitag, 21. März 2008 16:25
>> An: java-user@lucene.apache.org
>> Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
>>
>> Hi Uwe,
>>
>> Could you tell what Analyzer do you use when you marked so big indexing
>> speedup?
>> If you use StandardAnalyzer (that uses StandardTokenizer) may be the
>> reason is in it. You can see the pre last report in the thread "Indexing
>> Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake
>> Mannix this is because now StandardTokenizer uses StandardTokenizerImpl
>> that now is generated by JFlex instead of JavaCC.
>> I am asking because I noticed a great speedup in adding documents to
>> index in our system. We have time control on this in the debug mode. NOW
>> THEY ARE ADDED 5 TIMES FASTER!!!
>> But in the same time the total process of indexing in our case has
>> improvement of about 8%. As our system is very big and complex I am
>> wondering if really the whole process of indexing is reduces so
>> remarkably and our system causes this slowdown or may be Lucene does
>> some optimizations on the index, merges or something else and this is
>> the reason the total process of indexing to be not so reasonably faster.
>>
>> Best Regards,
>> Ivan
>>
>>
>>
>> Uwe Goetzke wrote:
>>> This week I switched the lucene library version on one customer system.
>>> The indexing speed went down from 46m32s to 16m20s for the complete 
>>> task
>>> including optimisation. Great Job!
>>> We index product catalogs from several suppliers, in this case around
>>> 56.000 product groups and 360.000 products including descriptions were
>>> indexed.
>>>
>>> Regards
>>>
>>> Uwe
>>>
>>>
>>>
>>> -----------------------------------------------------------------------
>>> Healy Hudson GmbH - D-55252 Mainz Kastel
>>> Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076
>>>
>>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte 
>>> Empfanger sind, durfen Sie die Informationen nicht offen legen oder 
>>> benutzen. Wenn Sie diese Email durch einen Fehler bekommen haben, 
>>> teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den 
>>> Absender zuruckschicken. Bitte loschen Sie danach diese Email.
>>> This email is confidential. If you are not the intended recipient, 
>>> you must not disclose or use this information contained in it. If 
>>> you have received this email in error please tell us immediately by 
>>> return email and delete the document.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> __________ NOD32 2913 (20080301) Information __________
>>>
>>> This message was checked by NOD32 antivirus system.
>>> http://www.eset.com
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> -----------------------------------------------------------------------
>> Healy Hudson GmbH - D-55252 Mainz Kastel
>> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
>>
>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte 
>> Empfänger sind, dürfen Sie die Informationen nicht offen legen oder 
>> benutzen. Wenn Sie diese Email durch einen Fehler bekommen haben, 
>> teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den 
>> Absender zurückschicken. Bitte löschen Sie danach diese Email.
>> This email is confidential. If you are not the intended recipient, 
>> you must not disclose or use this information contained in it. If you 
>> have received this email in error please tell us immediately by 
>> return email and delete the document.
>>
>>
>> ---------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> __________ NOD32 2968 (20080324) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message