lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix" <jake.man...@gmail.com>
Subject Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
Date Tue, 25 Mar 2008 16:12:40 GMT
Uwe,
  This is a little off thread-topic, but I was wondering how your
search relevance and search performance has fared with this
bigram-based index.  Is it significantly better than before you use
the NGramAnalyzer?
   -jake



On 3/24/08, Uwe Goetzke <uwe.goetzke@healy-hudson.com> wrote:
> Hi Ivan,
> No, we do not use StandardAnalyser or StandardTokenizer.
>
> Most data is processed by
> 	fTextTokenStream = result = new
> org.apache.lucene.analysis.WhitespaceTokenizer(reader);
> 	result = new ISOLatin2AccentFilter(result); // ISOLatin1AccentFilter
> modified that ö -> oe
> 	result = new org.apache.lucene.analysis.LowerCaseFilter(result);
> 	result = new org.apache.lucene.analysis.NGramStemFilter(result,2); //just a
> bigram tokenizer
>
> We use our own queryparser. The bigramms are searched with a tolerant phrase
> query, scoring in a doc the greatest bigramms clusters covering the phrase
> token.
>
> Best Regards
>
> Uwe
>
> -----Ursprüngliche Nachricht-----
> Von: Ivan Vasilev [mailto:ivasilev@sirma.bg]
> Gesendet: Freitag, 21. März 2008 16:25
> An: java-user@lucene.apache.org
> Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
>
> Hi Uwe,
>
> Could you tell what Analyzer do you use when you marked so big indexing
> speedup?
> If you use StandardAnalyzer (that uses StandardTokenizer) may be the
> reason is in it. You can see the pre last report in the thread "Indexing
> Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake
> Mannix this is because now StandardTokenizer uses StandardTokenizerImpl
> that now is generated by JFlex instead of JavaCC.
> I am asking because I noticed a great speedup in adding documents to
> index in our system. We have time control on this in the debug mode. NOW
> THEY ARE ADDED 5 TIMES FASTER!!!
> But in the same time the total process of indexing in our case has
> improvement of about 8%. As our system is very big and complex I am
> wondering if really the whole process of indexing is reduces so
> remarkably and our system causes this slowdown or may be Lucene does
> some optimizations on the index, merges or something else and this is
> the reason the total process of indexing to be not so reasonably faster.
>
> Best Regards,
> Ivan
>
>
>
> Uwe Goetzke wrote:
> > This week I switched the lucene library version on one customer system.
> > The indexing speed went down from 46m32s to 16m20s for the complete task
> > including optimisation. Great Job!
> > We index product catalogs from several suppliers, in this case around
> > 56.000 product groups and 360.000 products including descriptions were
> > indexed.
> >
> > Regards
> >
> > Uwe
> >
> >
> >
> > -----------------------------------------------------------------------
> > Healy Hudson GmbH - D-55252 Mainz Kastel
> > Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076
> >
> > Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger
> sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie
> diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte
> umgehend mit, indem Sie diese Email an den Absender zuruckschicken. Bitte
> loschen Sie danach diese Email.
> > This email is confidential. If you are not the intended recipient, you
> must not disclose or use this information contained in it. If you have
> received this email in error please tell us immediately by return email and
> delete the document.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > __________ NOD32 2913 (20080301) Information __________
> >
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
> >
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> -----------------------------------------------------------------------
> Healy Hudson GmbH - D-55252 Mainz Kastel
> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
>
> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger
> sind, dürfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie
> diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte
> umgehend mit, indem Sie diese Email an den Absender zurückschicken. Bitte
> löschen Sie danach diese Email.
> This email is confidential. If you are not the intended recipient, you must
> not disclose or use this information contained in it. If you have received
> this email in error please tell us immediately by return email and delete
> the document.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message