Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 9732 invoked from network); 28 Mar 2006 18:47:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Mar 2006 18:47:38 -0000 Received: (qmail 83567 invoked by uid 500); 28 Mar 2006 18:47:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83543 invoked by uid 500); 28 Mar 2006 18:47:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83532 invoked by uid 99); 28 Mar 2006 18:47:32 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Mar 2006 10:47:32 -0800 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=NO_REAL_NAME X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [12.22.58.18] (HELO mail.dicarta.com) (12.22.58.18) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Mar 2006 10:47:31 -0800 Received: from corpx.corp.dicarta.com (corpx.dicarta.com [10.1.8.35]) by mail.dicarta.com with ESMTP id k2SIl6s14757 for ; Tue, 28 Mar 2006 10:47:06 -0800 (PST) From: jwang@dicarta.com X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: Commercial vendors monitoring this ML? was: Lucene Performance Issues Date: Tue, 28 Mar 2006 10:47:10 -0800 Message-ID: <9778DA4F3D53D04B9AE80AC64AC073A30573EFDF@corpx.dicarta.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Commercial vendors monitoring this ML? was: Lucene Performance Issues Thread-Index: AcZShj8DhMhdHqthQTCjI+XxfnCi3QAEWmBA To: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Weird, I was just about to comment on the fact that since posting that my organization has decided to use Lucene, I got calls from two commercial vendors that didn't give me the time of the day while I was doing my comparison analysis. Both of them referred to some random "colleague" in the business referring them to me. Jeff Wang diCarta, Inc. -----Original Message----- From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20 Sent: Tuesday, March 28, 2006 8:39 AM To: java-user@lucene.apache.org Subject: Re: Lucene Performance Issues Hi Thomas, Sound like FUD to me. No concrete numbers, and the benchmark they mention.... eh, haven't we all seen "funny" benchmarks before? Lucene is used in many large operations (e.g. Technorati, Simpy) that involve a LOT of indexing and searching, large indices, etc. I suggest you try both and see which one suits your needs.=20 Otis ----- Original Message ---- From: thomasg To: java-user@lucene.apache.org Sent: Tuesday, March 28, 2006 5:06:54 AM Subject: Lucene Performance Issues Hi, we are currently intending to implement a document storage / search tool using Jackrabbit and Lucene. We have been approached by a commercial search and indexing organisation called ISYS who are suggesting the following problems with using Lucene. We do have a requirement to store and search large documents and the total document store will be large too. Any comments on the following would be greatly appreciated. 1) By default, Lucene only indexes the first 10,000 words from each document. When increasing this default out-of-memory errors can occur. This implies that documents, or large sections thereof, are loaded into memory. ISYS has a very small memory footprint which is not affected by document size nor number of documents. =20 2) Lucene appears to be slow at indexing, at least by ISYS' standards. Published performance benchmarks seem to vary between almost acceptable, down to very poor. ISYS' file readers are already optimized for the fastest text extraction possible. =20 3) The Lucene documentation suggests it can be slow at searching and can get slower and slower the larger your indexes get. The tipping point is where the index size exceeds the amount of free memory in your machine. This also implies that whole indexes, or large portions of them, are loaded into memory. The bigger the index, the more powerful the machine required. ISYS' search speed is always proportional to the size of the result set. Index size does not materially affect search speed and the index is never loaded into memory. It also appears that Lucene requires hands-on tuning to keep its search speed acceptable. ISYS' indexes are self-managing and do not require any maintenance to keep them searchable at full speed. Thanks, Thomas -- View this message in context: http://www.nabble.com/Lucene-Performance-Issues-t1354811.html#a3626992 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org