Yes Michael it seems our app takes time to retrieve docs from their sources. I have to run some profiler tool to see where is the bottleneck in our case. Thanks to you and Uwe for the answers! Ivan Michael McCandless wrote: > > Ivan can you describe more about your application? > > The overall time for indexing has gotten much faster in 2.3, but this > is assuming things like retrieving a document from its original > source, filtering it, etc, are minimal. If you have an application > where most of the time is spent outside Lucene then the 2.3 speedups > won't result in a very large speedup for your application. > > Mike > > Uwe Goetzke wrote: >> Hi Ivan, >> No, we do not use StandardAnalyser or StandardTokenizer. >> >> Most data is processed by >> fTextTokenStream = result = new >> org.apache.lucene.analysis.WhitespaceTokenizer(reader); >> result = new ISOLatin2AccentFilter(result); // >> ISOLatin1AccentFilter modified that ö -> oe >> result = new org.apache.lucene.analysis.LowerCaseFilter(result); >> result = new >> org.apache.lucene.analysis.NGramStemFilter(result,2); //just a bigram >> tokenizer >> >> We use our own queryparser. The bigramms are searched with a tolerant >> phrase query, scoring in a doc the greatest bigramms clusters >> covering the phrase token. >> >> Best Regards >> >> Uwe >> >> -----Ursprüngliche Nachricht----- >> Von: Ivan Vasilev [mailto:ivasilev@sirma.bg] >> Gesendet: Freitag, 21. März 2008 16:25 >> An: java-user@lucene.apache.org >> Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1 >> >> Hi Uwe, >> >> Could you tell what Analyzer do you use when you marked so big indexing >> speedup? >> If you use StandardAnalyzer (that uses StandardTokenizer) may be the >> reason is in it. You can see the pre last report in the thread "Indexing >> Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake >> Mannix this is because now StandardTokenizer uses StandardTokenizerImpl >> that now is generated by JFlex instead of JavaCC. >> I am asking because I noticed a great speedup in adding documents to >> index in our system. We have time control on this in the debug mode. NOW >> THEY ARE ADDED 5 TIMES FASTER!!! >> But in the same time the total process of indexing in our case has >> improvement of about 8%. As our system is very big and complex I am >> wondering if really the whole process of indexing is reduces so >> remarkably and our system causes this slowdown or may be Lucene does >> some optimizations on the index, merges or something else and this is >> the reason the total process of indexing to be not so reasonably faster. >> >> Best Regards, >> Ivan >> >> >> >> Uwe Goetzke wrote: >>> This week I switched the lucene library version on one customer system. >>> The indexing speed went down from 46m32s to 16m20s for the complete >>> task >>> including optimisation. Great Job! >>> We index product catalogs from several suppliers, in this case around >>> 56.000 product groups and 360.000 products including descriptions were >>> indexed. >>> >>> Regards >>> >>> Uwe >>> >>> >>> >>> ----------------------------------------------------------------------- >>> Healy Hudson GmbH - D-55252 Mainz Kastel >>> Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076 >>> >>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte >>> Empfanger sind, durfen Sie die Informationen nicht offen legen oder >>> benutzen. Wenn Sie diese Email durch einen Fehler bekommen haben, >>> teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den >>> Absender zuruckschicken. Bitte loschen Sie danach diese Email. >>> This email is confidential. If you are not the intended recipient, >>> you must not disclose or use this information contained in it. If >>> you have received this email in error please tell us immediately by >>> return email and delete the document. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> __________ NOD32 2913 (20080301) Information __________ >>> >>> This message was checked by NOD32 antivirus system. >>> http://www.eset.com >>> >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> ----------------------------------------------------------------------- >> Healy Hudson GmbH - D-55252 Mainz Kastel >> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076 >> >> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte >> Empfänger sind, dürfen Sie die Informationen nicht offen legen oder >> benutzen. Wenn Sie diese Email durch einen Fehler bekommen haben, >> teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den >> Absender zurückschicken. Bitte löschen Sie danach diese Email. >> This email is confidential. If you are not the intended recipient, >> you must not disclose or use this information contained in it. If you >> have received this email in error please tell us immediately by >> return email and delete the document. >> >> >> --------------------------------- > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > __________ NOD32 2968 (20080324) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org