Chris Miller wrote:
> Thanks for your commments Ulrich. I just posted a message asking if anyone
> had attempted this approach! Sounds like you have, and it works :-) Thanks
> for information, this sounds pretty close to what my preferred approach
> would be.
This is a good approach if the number of total documents doesn't grow
too much. There's obviously a limit to full index runs at some point.
> You say you get 2000 docs/minute. I've done some benchmarking and managed to
> get our data indexing at ~1000/minute on an Athlon 1800+ (and most of that
> speed was acheived by bumping the IndexWriter.mergeFactor up to 100 or so).
> Our data is coming from a database table, each record contains about 40
> fields, and I'm indexing 8 of those fields (an ID, 4 number fields, 3 text
> fields including one that has ~2k text). Does this sound reasonable to you,
> or do you have any tips that might improve that performance?
You need to find out where you lose most of the time:
a) in data access (like your database could be too slow, in my case I am
scanning the local filesystem)
b) in parsing (probably not an issue when reading from a DB, but in my
case it is, I have HTML files)
c) in indexing
I haven't gone to the trouble to find that out for my app, because it is
fast enough the way it is.
However, what I wonder: if you have your data in a database anyway, why
not use the database's indexing features? It seems like Lucene is an
additional layer on top of your data, which you don't really need.
cheers,
Ulrich
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|