lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ulrich Mayring <u...@denic.de>
Subject Re: commercial websites powered by Lucene?
Date Tue, 24 Jun 2003 11:39:32 GMT
Chris Miller wrote:
> Thanks for your commments Ulrich. I just posted a message asking if anyone
> had attempted this approach! Sounds like you have, and it works :-)  Thanks
> for information, this sounds pretty close to what my preferred approach
> would be.

This is a good approach if the number of total documents doesn't grow 
too much. There's obviously a limit to full index runs at some point.

> You say you get 2000 docs/minute. I've done some benchmarking and managed to
> get our data indexing at ~1000/minute on an Athlon 1800+ (and most of that
> speed was acheived by bumping the IndexWriter.mergeFactor up to 100 or so).
> Our data is coming from a database table, each record contains about 40
> fields, and I'm indexing 8 of those fields (an ID, 4 number fields, 3 text
> fields including one that has ~2k text). Does this sound reasonable to you,
> or do you have any tips that might improve that performance?

You need to find out where you lose most of the time:

a) in data access (like your database could be too slow, in my case I am 
scanning the local filesystem)
b) in parsing (probably not an issue when reading from a DB, but in my 
case it is, I have HTML files)
c) in indexing

I haven't gone to the trouble to find that out for my app, because it is 
fast enough the way it is.

However, what I wonder: if you have your data in a database anyway, why 
not use the database's indexing features? It seems like Lucene is an 
additional layer on top of your data, which you don't really need.

cheers,

Ulrich



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message