lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runde, Kevin" <>
Subject RE: Commercial vendors monitoring this ML? was: Lucene Performance Issues
Date Tue, 28 Mar 2006 19:05:49 GMT
Of course they are monitoring this mail list, Lucene rocks and it is
beating them. Do yourself a favor and dedicate some time to testing
Lucene vs. any commercial application. A little time spent up front
testing the tools can save you significant time later optimizing,
hacking in a new tool, or refactoring your program because you didn't
understand how to "really" use the tool. We did that here and were
amazed. We found index size was 1/4 and query speed was 4 times faster
when comparing Lucene to several commercial tools. This was on indexes
that were much larger than physical RAM on the box.

-----Original Message-----
From: [] 
Sent: Tuesday, March 28, 2006 12:47 PM
Subject: Commercial vendors monitoring this ML? was: Lucene Performance

Weird, I was just about to comment on the fact that since posting that
my organization has decided to use Lucene, I got calls from two
commercial vendors that didn't give me the time of the day while I was
doing my comparison analysis.

Both of them referred to some random "colleague" in the business
referring them to me.

Jeff Wang
diCarta, Inc.

-----Original Message-----
From: Otis Gospodnetic [] 
Sent: Tuesday, March 28, 2006 8:39 AM
Subject: Re: Lucene Performance Issues

Hi Thomas,

Sound like FUD to me.  No concrete numbers, and the benchmark they
mention.... eh, haven't we all seen "funny" benchmarks before?  Lucene
is used in many large operations (e.g. Technorati, Simpy) that involve a
LOT of indexing and searching, large indices, etc.  I suggest you try
both and see which one suits your needs. 


----- Original Message ----
From: thomasg <>
Sent: Tuesday, March 28, 2006 5:06:54 AM
Subject: Lucene Performance Issues

Hi, we are currently intending to implement a document storage / search
using Jackrabbit and Lucene. We have been approached by a commercial
and indexing organisation called ISYS who are suggesting the following
problems with using Lucene. We do have a requirement to store and search
large documents and the total document store will be large too. Any
on the following would be greatly appreciated.

1) By default, Lucene only indexes the first 10,000 words from each
document. When increasing this default out-of-memory errors can occur.
implies that documents, or large sections thereof, are loaded into
ISYS has a very small memory footprint which is not affected by document
size nor number of documents.

2) Lucene appears to be slow at indexing, at least by ISYS' standards.
Published performance benchmarks seem to vary between almost acceptable,
down to very poor. ISYS' file readers are already optimized for the
text extraction possible.

3) The Lucene documentation suggests it can be slow at searching and can
slower and slower the larger your indexes get. The tipping point is
the index size exceeds the amount of free memory in your machine. This
implies that whole indexes, or large portions of them, are loaded into
memory. The bigger the index, the more powerful the machine required.
search speed is always proportional to the size of the result set. Index
size does not materially affect search speed and the index is never
into memory. It also appears that Lucene requires hands-on tuning to
its search speed acceptable. ISYS' indexes are self-managing and do not
require any maintenance to keep them searchable at full speed.

Thanks, Thomas
View this message in context:
Sent from the Lucene - Java Users forum at

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message