lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Ratovsky <r...@correlsense.com>
Subject Re: Performance Optimizations and Expected Benchmark Results
Date Mon, 30 Aug 2010 07:53:35 GMT
Hi Ted and Jenny,
Thanks for both your responses.
In regards to Jenny's question - the answer is yes. There's no problem
processing the objects in batches. I'd be interested to know why that would
affect performance.
As for the numbers and calculations, Ted, thanks for that.
It really opened our eyes in realization that our requirements are not clear
enough.
It's obvious we still have work to do before being able to give out the
actual numbers, but once we have them, I'll post back here.

On Sun, Aug 29, 2010 at 21:17, Ted Dunning <ted.dunning@gmail.com> wrote:

> So the current state of your problem is this:
>
> a) desired indexing speed = 10,000 objects per second (peak rate) (ish)
>
> b) total number of objects = 10,000,000
>
> This gives desired from-scratch indexing time of 1000 seconds = 17 minutes
>
> c) object life = 2+ days
>
> but 2 days = 170,000 seconds.  At 10,000 objects per second, this would be
> 1.7 billion objects which is nearly 100x the actual size.  So your rate
> assumptions have massive peak / valley ratios.
>
> If all 10 million objects turn over in 2 days, the average indexing speed
> only needs to be 60 objects per second.  I suspect that this is actually
> considerably higher than you meant to imply, but we can still use it.
>
> d) desired latency before search = a few minutes (call it 100 seconds)
>
> So it sounds like your objects arrive in batches or like you reprocess all
> of your objects frequently.
>
> My question about incremental indexing had to do with whether you really
> needed to re-index everything from scratch every time or whether it would
> be
> feasible to simply index new objects as they arrive.  Moreover, if you
> dedicate a single index per day of data and
> the only deletion policy is mass expiration, then you can simply delete an
> index to accomplish all deletion.
>
> You earlier said that you could pretty easily achieve 1000 objects per
> second indexing speed.  If we assume that your data arrives every 30
> seconds
> in a batch of about 2000 objects, then the indexing for this batch should
> take about 2 seconds.  That seems
> to give you at least a 15:1 safety margin at the cost of implementing a
> buffer that can store a few thousand objects.
>
> Why doesn't that work for you?
>
>
> On Sun, Aug 29, 2010 at 1:18 AM, Ron Ratovsky <ronr@correlsense.com>
> wrote:
>
> > Answers are within the message.
> >
> > On Fri, Aug 27, 2010 at 22:05, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >
> > > Can you say a bit more about your application?  How many objects total
> > are
> > > there?
> >
> > Our goal is to hold a few tens of millions objects at any given time.
> >
> >
> > > What is an object lifetime?
> > >
> > At minimum - 2 days. It can increase depending on the application stress
> > (with inverse relation).
> >
> >
> > > How soon must an object be searchable?
> >
> > Preferably asap, but a few minutes should suffice. We don't want to start
> > generating a back-log since it'll just keep growing.
> >
> > > Can the index be built incrementally?
> >
> > I'm not entirely sure what you mean by that.
> >
> > > What is your search speed/throughput requirement?
> >
> > Currently, I don't have the numbers exactly. I imagine the load on the
> > search would be fairly 'low', but I don't know how to quantify it yet.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message