Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 19717 invoked from network); 28 Mar 2006 16:39:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Mar 2006 16:39:55 -0000 Received: (qmail 35276 invoked by uid 500); 28 Mar 2006 16:39:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 35252 invoked by uid 500); 28 Mar 2006 16:39:49 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 35240 invoked by uid 99); 28 Mar 2006 16:39:49 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Mar 2006 08:39:49 -0800 X-ASF-Spam-Status: No, hits=2.3 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_WHOIS,FORGED_YAHOO_RCVD X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [206.190.38.59] (HELO web50305.mail.yahoo.com) (206.190.38.59) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 28 Mar 2006 08:39:47 -0800 Received: (qmail 64053 invoked by uid 60001); 28 Mar 2006 16:39:26 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=1Cf1wQvSXABDFKnlNGQps0xyAldbh5Pjg4gBhL3YFX8ZQg95oj1Ps+m3uB/ndIXcvBYqkaT0bzfyAggbFeQRm2hFm0Wc0779U/OizYoABX7aAUgayPB6vzb9KrmsBOmkZEgD9QUQwIIBH1xc+D6WERiC+3nsfPi1ZUogHI390AY= ; Message-ID: <20060328163926.64051.qmail@web50305.mail.yahoo.com> Date: Tue, 28 Mar 2006 08:39:26 -0800 (PST) From: Otis Gospodnetic Reply-To: Otis Gospodnetic Subject: Re: Lucene Performance Issues To: java-user@lucene.apache.org In-Reply-To: <3626992.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi Thomas, Sound like FUD to me. No concrete numbers, and the benchmark they mention.... eh, haven't we all seen "funny" benchmarks before? Lucene is used in many large operations (e.g. Technorati, Simpy) that involve a LOT of indexing and searching, large indices, etc. I suggest you try both and see which one suits your needs. Otis ----- Original Message ---- From: thomasg To: java-user@lucene.apache.org Sent: Tuesday, March 28, 2006 5:06:54 AM Subject: Lucene Performance Issues Hi, we are currently intending to implement a document storage / search tool using Jackrabbit and Lucene. We have been approached by a commercial search and indexing organisation called ISYS who are suggesting the following problems with using Lucene. We do have a requirement to store and search large documents and the total document store will be large too. Any comments on the following would be greatly appreciated. 1) By default, Lucene only indexes the first 10,000 words from each document. When increasing this default out-of-memory errors can occur. This implies that documents, or large sections thereof, are loaded into memory. ISYS has a very small memory footprint which is not affected by document size nor number of documents. 2) Lucene appears to be slow at indexing, at least by ISYS' standards. Published performance benchmarks seem to vary between almost acceptable, down to very poor. ISYS' file readers are already optimized for the fastest text extraction possible. 3) The Lucene documentation suggests it can be slow at searching and can get slower and slower the larger your indexes get. The tipping point is where the index size exceeds the amount of free memory in your machine. This also implies that whole indexes, or large portions of them, are loaded into memory. The bigger the index, the more powerful the machine required. ISYS' search speed is always proportional to the size of the result set. Index size does not materially affect search speed and the index is never loaded into memory. It also appears that Lucene requires hands-on tuning to keep its search speed acceptable. ISYS' indexes are self-managing and do not require any maintenance to keep them searchable at full speed. Thanks, Thomas -- View this message in context: http://www.nabble.com/Lucene-Performance-Issues-t1354811.html#a3626992 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org