lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: SOLR Performance
Date Mon, 03 Nov 2008 21:07:10 GMT
The indexing box can be much smaller, especially in terms of CPU.
It just needs one fast thread and enough disk.

wunder

On 11/3/08 2:58 PM, "Alok Dhir" <adhir@symplicity.com> wrote:

> I was afraid of that.  Was hoping not to need another big fat box like
> this one...
> 
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> adhir@symplicity.com
> 
> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:
> 
>> I believe this is one of the reasons that a master/slave configuration
>> comes in handy. Commits to the Master don't slow down queries on the
>> Slave.
>> 
>> -Todd
>> 
>> -----Original Message-----
>> From: Alok Dhir [mailto:adhir@symplicity.com]
>> Sent: Monday, November 03, 2008 1:47 PM
>> To: solr-user@lucene.apache.org
>> Subject: SOLR Performance
>> 
>> We've moved past this issue by reducing date precision -- thanks to
>> all for the help.  Now we're at another problem.
>> 
>> There is relatively constant updating of the index -- new log entries
>> are pumped in from several applications continuously.  Obviously, new
>> entries do not appear in searches until after a commit occurs.
>> 
>> The problem is, issuing a commit causes searches to come to a
>> screeching halt for up to 2 minutes.  We're up to around 80M docs.
>> Index size is 27G.  The number of docs will soon be 800M, which
>> doesn't bode well for these "pauses" in search performance.
>> 
>> I'd appreciate any suggestions.
>> 
>> ---
>> Alok K. Dhir
>> Symplicity Corporation
>> www.symplicity.com
>> (703) 351-0200 x 8080
>> adhir@symplicity.com
>> 
>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:
>> 
>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.
>>> 
>>> Fairly simple schema -- no large text fields, standard request
>>> handler.  4 small facet fields.
>>> 
>>> The index is an event log -- a primary search/retrieval requirement
>>> is date range queries.
>>> 
>>> A simple query without a date range subquery is ridiculously fast -
>>> 2ms.  The same query with a date range takes up to 30s (30,000ms).
>>> 
>>> Concrete example, this query just look 18s:
>>> 
>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
>> TO
>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>>> 
>>> The exact same query without the date range took 2ms.
>>> 
>>> I saw a thread from Apr 2008 which explains the problem being due to
>>> too much precision on the DateField type, and the range expansion
>>> leading to far too many elements being checked.  Proposed solution
>>> appears to be a hack where you index date fields as strings and
>>> hacking together date functions to generate proper queries/format
>>> results.
>>> 
>>> Does this remain the recommended solution to this issue?
>>> 
>>> Thanks
>>> 
>>> ---
>>> Alok K. Dhir
>>> Symplicity Corporation
>>> www.symplicity.com
>>> (703) 351-0200 x 8080
>>> adhir@symplicity.com
>>> 
>> 
>> 
> 


Mime
View raw message