Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 16736 invoked from network); 25 Mar 2008 16:13:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Mar 2008 16:13:23 -0000 Received: (qmail 86131 invoked by uid 500); 25 Mar 2008 16:13:15 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86104 invoked by uid 500); 25 Mar 2008 16:13:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86087 invoked by uid 99); 25 Mar 2008 16:13:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2008 09:13:15 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jake.mannix@gmail.com designates 209.85.132.250 as permitted sender) Received: from [209.85.132.250] (HELO an-out-0708.google.com) (209.85.132.250) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2008 16:12:32 +0000 Received: by an-out-0708.google.com with SMTP id c5so742403anc.49 for ; Tue, 25 Mar 2008 09:12:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=QcezUNeCM1nsuEneUfdoqx9T+W5PTVTxFrxCYkEWK34=; b=cB5A7cRCeGt7UlB2IS5NQHPe6CMk/2OOzG+2cANdLkKAOBsmygoAxoEhDpVb+7smNIl0YTNmSEm0ewHoljddDnqPDUX7LILGptbjaiLVhSSQXaJv4neX2kcOqGqw8/zT3D5k9/mw0lj/mMYHAxCekrM8sZgJ9YBoOCaJh5qWe/g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=AIeimmesiWRmawx9UpM/uidMUgkwLiivi5mz6oXLzSI/CGlYkOm9ux1J+urGkNB/DQydSEfXLVDh8TBbL++Q5LNfkj7RdpSo/pNF26mYY93as+uYork8Aq5nppFoZ+TBSafi2wh/HE72a7uu5dnFBOeGfrRrdTwGikVbrVc5vE0= Received: by 10.100.41.8 with SMTP id o8mr22153464ano.82.1206461561138; Tue, 25 Mar 2008 09:12:41 -0700 (PDT) Received: by 10.100.42.8 with HTTP; Tue, 25 Mar 2008 09:12:40 -0700 (PDT) Message-ID: <4b124c310803250912w5e405ca7mae67a284c2e040d@mail.gmail.com> Date: Tue, 25 Mar 2008 09:12:40 -0700 From: "Jake Mannix" To: java-user@lucene.apache.org Subject: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1 In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <47E3D350.8070200@sirma.bg> X-Virus-Checked: Checked by ClamAV on apache.org Uwe, This is a little off thread-topic, but I was wondering how your search relevance and search performance has fared with this bigram-based index. Is it significantly better than before you use the NGramAnalyzer? -jake On 3/24/08, Uwe Goetzke wrote: > Hi Ivan, > No, we do not use StandardAnalyser or StandardTokenizer. > > Most data is processed by > =09fTextTokenStream =3D result =3D new > org.apache.lucene.analysis.WhitespaceTokenizer(reader); > =09result =3D new ISOLatin2AccentFilter(result); // ISOLatin1AccentFilter > modified that =F6 -> oe > =09result =3D new org.apache.lucene.analysis.LowerCaseFilter(result); > =09result =3D new org.apache.lucene.analysis.NGramStemFilter(result,2); /= /just a > bigram tokenizer > > We use our own queryparser. The bigramms are searched with a tolerant phr= ase > query, scoring in a doc the greatest bigramms clusters covering the phras= e > token. > > Best Regards > > Uwe > > -----Urspr=FCngliche Nachricht----- > Von: Ivan Vasilev [mailto:ivasilev@sirma.bg] > Gesendet: Freitag, 21. M=E4rz 2008 16:25 > An: java-user@lucene.apache.org > Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1 > > Hi Uwe, > > Could you tell what Analyzer do you use when you marked so big indexing > speedup? > If you use StandardAnalyzer (that uses StandardTokenizer) may be the > reason is in it. You can see the pre last report in the thread "Indexing > Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake > Mannix this is because now StandardTokenizer uses StandardTokenizerImpl > that now is generated by JFlex instead of JavaCC. > I am asking because I noticed a great speedup in adding documents to > index in our system. We have time control on this in the debug mode. NOW > THEY ARE ADDED 5 TIMES FASTER!!! > But in the same time the total process of indexing in our case has > improvement of about 8%. As our system is very big and complex I am > wondering if really the whole process of indexing is reduces so > remarkably and our system causes this slowdown or may be Lucene does > some optimizations on the index, merges or something else and this is > the reason the total process of indexing to be not so reasonably faster. > > Best Regards, > Ivan > > > > Uwe Goetzke wrote: > > This week I switched the lucene library version on one customer system. > > The indexing speed went down from 46m32s to 16m20s for the complete tas= k > > including optimisation. Great Job! > > We index product catalogs from several suppliers, in this case around > > 56.000 product groups and 360.000 products including descriptions were > > indexed. > > > > Regards > > > > Uwe > > > > > > > > ----------------------------------------------------------------------- > > Healy Hudson GmbH - D-55252 Mainz Kastel > > Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076 > > > > Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger > sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn = Sie > diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte > umgehend mit, indem Sie diese Email an den Absender zuruckschicken. Bitte > loschen Sie danach diese Email. > > This email is confidential. If you are not the intended recipient, you > must not disclose or use this information contained in it. If you have > received this email in error please tell us immediately by return email a= nd > delete the document. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > __________ NOD32 2913 (20080301) Information __________ > > > > This message was checked by NOD32 antivirus system. > > http://www.eset.com > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > ----------------------------------------------------------------------- > Healy Hudson GmbH - D-55252 Mainz Kastel > Gesch=E4ftsf=FChrer Christian Konh=E4user - Amtsgericht Wiesbaden HRB 120= 76 > > Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empf=E4nger > sind, d=FCrfen Sie die Informationen nicht offen legen oder benutzen. Wen= n Sie > diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte > umgehend mit, indem Sie diese Email an den Absender zur=FCckschicken. Bit= te > l=F6schen Sie danach diese Email. > This email is confidential. If you are not the intended recipient, you mu= st > not disclose or use this information contained in it. If you have receive= d > this email in error please tell us immediately by return email and delete > the document. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --=20 Sent from Gmail for mobile | mobile.google.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org