lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Miller" <>
Subject Re: commercial websites powered by Lucene?
Date Tue, 24 Jun 2003 09:11:47 GMT
So you have a holding table in a database (or directory on disk?) where you
store the incoming documents correct? Does each webserver run it's own
indexing thread which grabs any new documents every 20 minutes, or is there
a central process that manages that? I'm trying to understand how you know
when you can safely clean out the holding table.

Did you look at having just a single process that was responsible for
updating the index, and then pushing copies out to all the webservers? I'm
wondering if that might be worth investigating (since it would take a lot of
load off the webservers that are running the searches), or if it will be too
troublesome in practice.

Also, I'm interested to see how you handle the situation when a server gets
shutdown/restarted - does it just take a copy of the index from one of the
other servers (since it's own index is likely out of date)? I take it it's
not safe to copy an index while it is being updated, so you have to block on
that somehow?

PS: It's great to hear Lucene blows Oracle out of the water! I've got some
skeptical management that need some convincing, hearing stories like this
helps a lot :-)

"Nader S. Henein" <> wrote in message
> I handle updates or inserts the same way first I delete the document
> from the index and then I insert it (better safe than sorry), I batch my
> updates/inserts every twenty minutes, I would do it in smaller intervals
> but since I have to sync the XML files created from the DB to three
> machines (I maintain three separate Lucene indices on my three separate
> web-servers) it takes a little longer. You have to batch your changes
> because Updating the index takes time as opposed to deleted which I
> batch every two minutes. You won't have a problem updating the index and
> searching at the same time because lucene updates the index on a
> separate set of files and then when It's done it overwrites the old
> version. I've had to provide for Backups, and things like server crashes
> mid-indexing, but I was using Oracle Intermedia before and Lucene BLOWS

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message