lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Dhir <ad...@symplicity.com>
Subject Re: SOLR Performance
Date Mon, 03 Nov 2008 22:16:27 GMT
in terms of RAM -- how to size that on the indexer?

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
adhir@symplicity.com

On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:

> The indexing box can be much smaller, especially in terms of CPU.
> It just needs one fast thread and enough disk.
>
> wunder
>
> On 11/3/08 2:58 PM, "Alok Dhir" <adhir@symplicity.com> wrote:
>
>> I was afraid of that.  Was hoping not to need another big fat box  
>> like
>> this one...
>>
>> ---
>> Alok K. Dhir
>> Symplicity Corporation
>> www.symplicity.com
>> (703) 351-0200 x 8080
>> adhir@symplicity.com
>>
>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:
>>
>>> I believe this is one of the reasons that a master/slave  
>>> configuration
>>> comes in handy. Commits to the Master don't slow down queries on the
>>> Slave.
>>>
>>> -Todd
>>>
>>> -----Original Message-----
>>> From: Alok Dhir [mailto:adhir@symplicity.com]
>>> Sent: Monday, November 03, 2008 1:47 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: SOLR Performance
>>>
>>> We've moved past this issue by reducing date precision -- thanks to
>>> all for the help.  Now we're at another problem.
>>>
>>> There is relatively constant updating of the index -- new log  
>>> entries
>>> are pumped in from several applications continuously.  Obviously,  
>>> new
>>> entries do not appear in searches until after a commit occurs.
>>>
>>> The problem is, issuing a commit causes searches to come to a
>>> screeching halt for up to 2 minutes.  We're up to around 80M docs.
>>> Index size is 27G.  The number of docs will soon be 800M, which
>>> doesn't bode well for these "pauses" in search performance.
>>>
>>> I'd appreciate any suggestions.
>>>
>>> ---
>>> Alok K. Dhir
>>> Symplicity Corporation
>>> www.symplicity.com
>>> (703) 351-0200 x 8080
>>> adhir@symplicity.com
>>>
>>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:
>>>
>>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core  
>>>> machine.
>>>>
>>>> Fairly simple schema -- no large text fields, standard request
>>>> handler.  4 small facet fields.
>>>>
>>>> The index is an event log -- a primary search/retrieval requirement
>>>> is date range queries.
>>>>
>>>> A simple query without a date range subquery is ridiculously fast -
>>>> 2ms.  The same query with a date range takes up to 30s (30,000ms).
>>>>
>>>> Concrete example, this query just look 18s:
>>>>
>>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
>>> TO
>>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>>>>
>>>> The exact same query without the date range took 2ms.
>>>>
>>>> I saw a thread from Apr 2008 which explains the problem being due  
>>>> to
>>>> too much precision on the DateField type, and the range expansion
>>>> leading to far too many elements being checked.  Proposed solution
>>>> appears to be a hack where you index date fields as strings and
>>>> hacking together date functions to generate proper queries/format
>>>> results.
>>>>
>>>> Does this remain the recommended solution to this issue?
>>>>
>>>> Thanks
>>>>
>>>> ---
>>>> Alok K. Dhir
>>>> Symplicity Corporation
>>>> www.symplicity.com
>>>> (703) 351-0200 x 8080
>>>> adhir@symplicity.com
>>>>
>>>
>>>
>>
>


Mime
View raw message