lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Miller" <chris_overs...@hotmail.com>
Subject Re: commercial websites powered by Lucene?
Date Tue, 24 Jun 2003 10:58:28 GMT
Thanks David, that's about what I figured. Of course if the servers are
pulling the information then a central holding table that contains only new
data doesn't make much sense anymore. Instead I guess the easiest approach
would be to have a central table that contains the entire dataset, and has
last-modified timestamps on each record so the individual webservers can
grab just the data that was changed since they last ran an index update. My
concern still is that the effort of indexing (which is potentially quite
high) is being duplicated across all the webservers.

Is there any reason why it would be a bad idea to have one machine
responsible for grabbing updates and adding documents to a master index, so
the other servers could periodically grab a copy of that index and hot-swap
it with their previous copy? Is Lucene capable of handling that scenario?
Seems to me that this approach would reduce the stress on a webservers even
more, and even if the indexing server went down the webservers would still
have a stale index to search against. Has anyone attempted something like
this?


"David Medinets" <medined@mtolive.com> wrote in message
news:059601c33a3d$423547f0$6722a8c0@medined01...
> ----- Original Message -----
> From: "Chris Miller" <chris_overseas@hotmail.com>
> > Did you look at having just a single process that was responsible for
> > updating the index, and then pushing copies out to all the webservers?
I'm
> > wondering if that might be worth investigating (since it would take a
lot
> of
> > load off the webservers that are running the searches), or if it will be
> too
> > troublesome in practice.
>
> I've found that pulling information from a central source is simpler than
> pushing information. When information is pushing, there is much
> administration on the central server to track the recipient machines. It
> seems like servers are added and dropped from the push list. Additionally,
> you need to account for servers that stop responding. When information is
> pulled from the central source, these issues of coordination are
eliminated.
>
> David Medinets
> http://www.codebits.com




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message