Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 28789 invoked from network); 27 Sep 2010 11:36:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Sep 2010 11:36:19 -0000 Received: (qmail 30358 invoked by uid 500); 27 Sep 2010 11:36:17 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 29720 invoked by uid 500); 27 Sep 2010 11:36:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 29712 invoked by uid 99); 27 Sep 2010 11:36:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 11:36:11 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [160.53.250.157] (HELO mxsmtp1.etat-ge.ch) (160.53.250.157) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 11:36:04 +0000 Received: from localhost (localhost [127.0.0.1]) by mxsmtp1.etat-ge.ch (Postfix) with ESMTP id 4D8FFF0093 for ; Mon, 27 Sep 2010 13:35:42 +0200 (CEST) X-Virus-Scanned: Clean Received: from mxsmtp1.etat-ge.ch ([127.0.0.1]) by localhost (mxsmtp1.etat-ge.ch [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2YLb4lR8cxsa for ; Mon, 27 Sep 2010 13:35:42 +0200 (CEST) Received: from fregate.ge-admin.ad.etat-ge.ch (fregate.ge-admin.ad.etat-ge.ch [10.137.224.84]) by mxsmtp1.etat-ge.ch (Postfix) with ESMTP id 2F224F0091 for ; Mon, 27 Sep 2010 13:35:42 +0200 (CEST) Received: from ADAPA.ge-admin.ad.etat-ge.ch ([10.139.36.72]) by fregate.ge-admin.ad.etat-ge.ch with Microsoft SMTPSVC(6.0.3790.4675); Mon, 27 Sep 2010 13:35:41 +0200 x-mimeole: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CB5E38.1D9D85D0" Subject: Questions about Lucene usage recommendations Date: Mon, 27 Sep 2010 13:35:20 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Questions about Lucene usage recommendations Thread-Index: ActeOBEXIUoENNJLTmWJxuPcrTUc4w== From: "Pawlak Michel (DCTI)" To: X-OriginalArrivalTime: 27 Sep 2010 11:35:41.0716 (UTC) FILETIME=[1D8D2540:01CB5E38] ------_=_NextPart_001_01CB5E38.1D9D85D0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Hello, We have an application which is using lucene and we have strong performance issues (on bad days, some searches take more than 2 minutes). I'm new to the Lucene component, thus I'm not sure Lucene is correctly used and thus would like to have some information on lucene usage recommendations. This would help locate the problem (application code / lucene configuration / hardware / all) It would be great if a project committer / specialist could answer those questions. First some facts about the application :=20 - Lucene version being used : 2.1.0 (february 2007...) - around 1.4M "documents" to be indexed. - Db size (all data to be indexed is stored in DB fields) : 3.5 GB - Index file size on disk : 1.6 GB (note that one cfs file is 780M, another one is 600M, the rest consists of smaller files) - single indexer, multiple readers (6 readers) - around 150 documents are modified per day - indexing is done right after every modification - simple searches can take ages (for instance searching for "chocolate" could take for more than 2 minutes) - I do not have access to source code (yes that's the funny part) My questions :=20 - Is this version of Lucene still supported ? - What are the main reasons, if any, one should use the latest version of lucene instead of 2.1.0 ? (for instance : performance, stability, critical fixes, support, etc.) (the answer may sound obvious, but I would like to have an official answer) - Is there any recommendation concerning storage any Lucene user should know (not benchmarks, but recommendations such as "better use physical HDD", "do not use NFS if possible", "if your cfs files are greater than XYZ, better use this kind of storage", "if you have more than XYZ searches per second, better..." etc) - Is there any recommandation concerning cfs file size ?=20 - Is there a way to limit the size of cfs files ?=20 - What is the impact on search performance if cfs file size is limited ? - How often should optimization occur ? (every day, week, month ?) - I saw that IndexWriter has methods such as setMaxFieldLength() setMergeFactor() setMaxBufferedDocs() setMaxMergeDocs() Can you briefly explain how these can affect performance ? - Is there any other recommandation "dummies" should be informed of, and every expert has to know ? For instance as a list of lucene patterns / anti patterns which may affect performance. If my questions are not precise enough, do not hesitate to ask for details. If you see an obvious problem do not hesitate to tell me. A big thank you in advance for your help, Best regards, Michel ------_=_NextPart_001_01CB5E38.1D9D85D0--