lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Periodic Indexing DESIGN QUESTION
Date Wed, 09 May 2007 03:59:20 GMT
: For example, when you are indexing every hour and large document set
: is present, it takes >1 hr to index the documents.  Now you are
: already behind indexing for the next hour.  How do you design
: something that is robust?

fundementally, this question is really about issues in a producer/consumer
model then it is specificly about indexing... given a situation where data
comes into a queue (from some set of producers) and you wnat to process
that data (by some set of consumers) what do you do if the producers
produce faster then the consumers consume.

i know of 7 options:
  1) decrease the number of producers
  2) make the producers produce slower
  3) make the queue infinitely large
  4) make the queue block
  5) make the consumers consumer faster
  6) increase the number of consumers
  7) throw away data

#1, #2 and #3 are not usually practical but are listed for completelness.
#4 may be practical in some situations, but there are no easy rules to
know when.  #5 tends to be very feasible in a well designed system where
things can be parallelized while #6 can be frequently be achieved either
by profiling and optimizing your code, or by making your code do
less; which segues nicely to #7 -- it may sound like a joke but frequently
big throughput gains can be made by reducing the amount of data being sent
to the consumers ... sometimes it's a matter of taking some work that the
consumers do making the producers do it (ie: eliminating data from the
that you know you aren't going to index), in other cases it may truely be
throwing away data because you can see that your queue is so full you
switch into "critial info only mode" where you don't bother to process
every little bit of data -- just the important stuff.  you make the
concious choice that it's better to be caught up on the big stuff then to
fall way way behind dealing with the little stuff.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message