lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader S. Henein" <>
Subject RE: commercial websites powered by Lucene?
Date Tue, 24 Jun 2003 11:55:07 GMT
We were using Oracle Internedia before we switched to Lucene, and Lucene
has been much faster and it has allowed us to distribute our search
functionality over multiple servers, Intermedia which is supposedly one
of the best in the business couldn't hold a candle to Lucene, and our
Oracle installation and setup is impeccable, we spent years perfecting
it before we decided to separate from Intermedia and use Oracle as DBMS
not a search engine, also when you use lucene and not a proprietary
product like Intermedia we can switch databases at will if Licensing
fees become to high to ignore.


-----Original Message-----
From: news [] On Behalf Of Ulrich Mayring
Sent: Tuesday, June 24, 2003 3:40 PM
Subject: Re: commercial websites powered by Lucene?

Chris Miller wrote:
> Thanks for your commments Ulrich. I just posted a message asking if 
> anyone had attempted this approach! Sounds like you have, and it works

> :-)  Thanks for information, this sounds pretty close to what my 
> preferred approach would be.

This is a good approach if the number of total documents doesn't grow 
too much. There's obviously a limit to full index runs at some point.

> You say you get 2000 docs/minute. I've done some benchmarking and 
> managed to get our data indexing at ~1000/minute on an Athlon 1800+ 
> (and most of that speed was acheived by bumping the 
> IndexWriter.mergeFactor up to 100 or so). Our data is coming from a 
> database table, each record contains about 40 fields, and I'm indexing

> 8 of those fields (an ID, 4 number fields, 3 text fields including one

> that has ~2k text). Does this sound reasonable to you, or do you have 
> any tips that might improve that performance?

You need to find out where you lose most of the time:

a) in data access (like your database could be too slow, in my case I am

scanning the local filesystem)
b) in parsing (probably not an issue when reading from a DB, but in my 
case it is, I have HTML files)
c) in indexing

I haven't gone to the trouble to find that out for my app, because it is

fast enough the way it is.

However, what I wonder: if you have your data in a database anyway, why 
not use the database's indexing features? It seems like Lucene is an 
additional layer on top of your data, which you don't really need.



To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message