lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Performance Optimizations and Expected Benchmark Results
Date Sun, 29 Aug 2010 18:17:50 GMT
So the current state of your problem is this:

a) desired indexing speed = 10,000 objects per second (peak rate) (ish)

b) total number of objects = 10,000,000

This gives desired from-scratch indexing time of 1000 seconds = 17 minutes

c) object life = 2+ days

but 2 days = 170,000 seconds.  At 10,000 objects per second, this would be
1.7 billion objects which is nearly 100x the actual size.  So your rate
assumptions have massive peak / valley ratios.

If all 10 million objects turn over in 2 days, the average indexing speed
only needs to be 60 objects per second.  I suspect that this is actually
considerably higher than you meant to imply, but we can still use it.

d) desired latency before search = a few minutes (call it 100 seconds)

So it sounds like your objects arrive in batches or like you reprocess all
of your objects frequently.

My question about incremental indexing had to do with whether you really
needed to re-index everything from scratch every time or whether it would be
feasible to simply index new objects as they arrive.  Moreover, if you
dedicate a single index per day of data and
the only deletion policy is mass expiration, then you can simply delete an
index to accomplish all deletion.

You earlier said that you could pretty easily achieve 1000 objects per
second indexing speed.  If we assume that your data arrives every 30 seconds
in a batch of about 2000 objects, then the indexing for this batch should
take about 2 seconds.  That seems
to give you at least a 15:1 safety margin at the cost of implementing a
buffer that can store a few thousand objects.

Why doesn't that work for you?

On Sun, Aug 29, 2010 at 1:18 AM, Ron Ratovsky <> wrote:

> Answers are within the message.
> On Fri, Aug 27, 2010 at 22:05, Ted Dunning <> wrote:
> > Can you say a bit more about your application?  How many objects total
> are
> > there?
> Our goal is to hold a few tens of millions objects at any given time.
> > What is an object lifetime?
> >
> At minimum - 2 days. It can increase depending on the application stress
> (with inverse relation).
> > How soon must an object be searchable?
> Preferably asap, but a few minutes should suffice. We don't want to start
> generating a back-log since it'll just keep growing.
> > Can the index be built incrementally?
> I'm not entirely sure what you mean by that.
> > What is your search speed/throughput requirement?
> Currently, I don't have the numbers exactly. I imagine the load on the
> search would be fairly 'low', but I don't know how to quantify it yet.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message