lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Disjuctive Queries (OR queries) and FilterCache
Date Fri, 08 Nov 2013 13:34:45 GMT
Glad to hear you have a solution....

Best,
Erick


On Thu, Nov 7, 2013 at 5:12 PM, Patanachai Tangchaisin <
patanachai.tangchaisin@wizecommerce.com> wrote:

> Hi Erick,
>
> About the size of filter cache, previously we set it to 4,000.
> After we faced this problem, we changed it to 10,000.
> Still at size of 10,000 (always full), hitratio was 0.78 and "eviction"
> was as high as "insertion".
>
> About 100% Cpu, yes, it was Solr using it.
> I profiled an app, it was "DisjunctionSumScorer" that takes most CPU times.
> Since this is a required filter query, we set it for every requests.
> My assumption is because Solr cannot use a filter cache, the filter query
> has to be executed at a same time as normal query.
>
> However, we fix this problem by sorting our filter constraints before
> creating a filter query.
> So, {"1","2","3"}, {"2","3","1"}, {"3","2","1"} will be a same the filter
> query i.e. fq=x:("1"  OR "2" OR "3").
>
> We end up with very small filter cache size (<1,000) and hit ratio is now
> 0.99. There is no eviction at all.
> The median response time is now less than 200ms on 25 QPS.
>
> Thanks,
> Patanachai
>
>
> On 11/07/2013 04:37 AM, Erick Erickson wrote:
>
> Yeah, Solr's fq cache is pretty simple-minded,
> order matters. There's no good way to improve
> that except try to write your fq queries in the
> same order. It's actually quite tricky to
> disassemble/reassemble arbitrary queries to fix
> this problem.
>
> But in your case, you could write a custom query
> component that was able to handle this _specific_
> case relatively easily I should think.
>
> bq: Our machine always use 100% CPU
>
> This is strange. Are you sure Solr is using this?
> Are there any other processes on the server that
> might be using this? Top (*nix) might help here. If
> it's really all Solr, then you need another slave
> or two to handle the load. Do you get good responses
> when the QPS rate is, say 10?
>
> How big is your filter cache?
>
> A hit ratio of .76 isn't actually too bad. It looks like
> you're running for a long time, and if so the insert
> and eviction numbers will tend to the same number.
>
> Do beware of using NOW in your fq clauses, that can
> cause grief. See:
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> This seems like really poor performance, I'm puzzled.
>
> Best,
> Erick
>
>
>
>
> On Mon, Nov 4, 2013 at 8:38 PM, Patanachai Tangchaisin <
> patanachai.tangchaisin@wizecommerce.com<mailto:patana
> chai.tangchaisin@wizecommerce.com>> wrote:
>
>
>
> Hello,
>
> We are running our search system using Apache Solr 4.2.1 and using
> Master/Slave model.
> Our index has ~100M document. The index size is  ~20gb.
> The machine has 24 CPU and 48gb rams.
>
> Our response time is pretty bad, median is ~4 seconds with 25
> queries/second.
>
> We noticed a couple of things
> - Our machine always use 100% CPU.
> - There is a lot of room for Java Heap. We assign Xms12g and Xmx16g, but
> the size of heap is still only 12g
> - Solr's filterCache hit ratio is only 0.76 and the number of insertion
> and eviction is almost equal.
>
> The weird thing is
> - most items in Solr's filterCache (only 100 first) are specify to only
> 1 field which we filter it by using an OR query for this field. Note
> that every request will have this field constraint.
>
> For example, if field name is x
> fq=x:(1 OR 2 OR 3)&fq=y:'a'
> fq=x:(3 OR 2 OR 1)&fq=y:'b'
> fq=x:(2 OR 1 OR 3)&fq=y:'c'
>
> An order of items is different since it is an input from a different
> system.
>
> To me, it seems that Solr do a cache on this field in different entry if
> an order of item is different. e.g. "(1 OR 2)" and "(2 OR 1)" is going
> to be a different cache entry.
>
> Question:
> Is there other way to create a fq parameter using 'OR' and make Solr
> cache them as a same entry?
>
>
> Thanks,
> Patanachai Tangchaisin
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message