lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Binoy Dalal <binoydala...@gmail.com>
Subject Re: enable disable filter query caching based on statistics
Date Tue, 05 Jan 2016 19:02:59 GMT
@Eric I might be wrong here so please correct me if I am.
In the particular case that Matteo has given applying the filters as post
won't make any difference since the query is going to return all docs
anyways. In such a case won't applying fqs normally be the same as applying
them as post filters?
I assume here that that was your intention while writing the costs as > 100.

On Tue, 5 Jan 2016, 23:47 Erick Erickson <erickerickson@gmail.com> wrote:

>
> &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>
> You have a comma in front of the last fq clause, typo?
>
> Well, the whole point of caching filter queries is so that the
> _second_ time you use it,
> very little work has to be done. That comes at a cost of course for
> first-time execution.
> Basically any fq clause that you can guarantee won't be re-used should
> have cache=false
> set.
>
> I'd be surprised if the second time you use the provincia and type fq
> clauses not caching
> would be faster, but I've been surprised before. I guess anding two
> bitsets together could
> take more time than, say, testing a small number of individual
> documents....
>
> And I'm assuming that you're testing multiple queries rather than just
> one-offs.
>
> If you _do_ know that some of your clauses are very restrictive, I
> wonder what happens if
> you add a cost in. fq's are evaluated in cost order (when
> cache=false), so what happens
> in this case?
> &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false
> cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz
>
> Best,
> Erick
>
> On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <matteo.grolla@gmail.com>
> wrote:
> > Thanks Erik and Binoy,
> >      This is a case I stumbled upon: with queries like
> >
> >
> q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
> >
> > where n_rea filter is highly selective
> > I was able to make > 3x performance improvement disabling cache
> >
> > I think it's because the last two filters are not so selective, they are
> > resolved to two bitset which are then anded together
> > and this is less efficient than leapfrogging since the first filter has
> > just one or two results.
> > Does it make sense to you?
> >
> >
> >
> >
> >
> > 2016-01-05 16:59 GMT+01:00 Erick Erickson <erickerickson@gmail.com>:
> >
> >> Matteo:
> >>
> >> Let's see if I understand your problem. Essentially you want
> >> Solr to analyze the filter queries and decide through some
> >> algorithm which ones to cache. I have a hard time thinking of
> >> any general way to do this, certainly there's not hing in Solr
> >> that does this automatically As Binoy mentions there are some
> >> ways to influence what goes in the cache, but the algorithm is
> >> simple....
> >>
> >> If you build such a thing, I suspect you'll be implicitly building
> >> in knowledge of how your particular application uses Solr. For
> >> sure, the functionality around "no cache filters" is there explicitly
> >> because some fq clauses (think ACL calculations) can be
> >> very expensive to calculate for the entire corpus (which is what
> >> fqs do by default).
> >>
> >> But you really haven't given us some examples of what sorts
> >> of fq clauses you consider "bad". Perhaps there are other ways
> >> of approaching your problem.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <binoydalal93@gmail.com>
> >> wrote:
> >> > What is your exact requirement then?
> >> > I ask, because these settings can solve the problems you've mentioned
> >> > without the need to add any additional functionality.
> >> >
> >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.grolla@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Hi Binoy,
> >> >>      I know these settings but the problem I'm trying to solve is
> when
> >> >> these settings aren't enough.
> >> >>
> >> >>
> >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <binoydalal93@gmail.com>:
> >> >>
> >> >> > If I understand your problem correctly, then you don't want the
> most
> >> >> > frequently used fqs removed and you do not want your filter cache
> to
> >> grow
> >> >> > to very large sizes.
> >> >> > Well there is already a solution for both of these.
> >> >> > In the solrconfig.xml file, you can configure the <filterCache>
> >> parameter
> >> >> > to suit your needs.
> >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
> >> >> > b) Set the size to whatever number of fqs you find suitable.
> >> >> > You can do this like so:
> >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
> >> >> > autoWarmCount="10"/>
> >> >> > You should play around with these parameters to find the best
> >> combination
> >> >> > for your implementation.
> >> >> > For more details take a look here:
> >> >> > https://wiki.apache.org/solr/SolrCaching
> >> >> > http://yonik.com/advanced-filter-caching-in-solr/
> >> >> >
> >> >> >
> >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <
> matteo.grolla@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >     after looking at the presentation of cloudsearch from
lucene
> >> >> > revolution
> >> >> > > 2014
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> >> >> > > min 17:08
> >> >> > >
> >> >> > > I recognized I'd love to be able to remove the burden of
> disabling
> >> >> filter
> >> >> > > query caching from developers
> >> >> > >
> >> >> > > the problem:
> >> >> > > Solr by default caches filter queries
> >> >> > > a) When there are filter queries that are not reused and
few that
> >> are
> >> >> the
> >> >> > > good ones get evicted unnecessarily
> >> >> > > b) if the same query has multiple filter queries that are
very
> >> >> selective
> >> >> > I
> >> >> > > noticed a big performance disabling cache
> >> >> > > c) I'd like to spare developers from deciding what has to
be
> cached
> >> or
> >> >> > not
> >> >> > >
> >> >> > > the question:
> >> >> > > -Is there anything already working to solve those problems?
> >> >> > >
> >> >> > > what do you think about this?
> >> >> > > -I was thinking to write a plugin to recognize query types
with
> >> regular
> >> >> > > exception and let solr admins associate a caching behaviour
with
> >> each
> >> >> > query
> >> >> > > type
> >> >> > > -another idea was to
> >> >> > >    -by default set fq caching off
> >> >> > >    -keep statistics about fq
> >> >> > >    -enable caching only for the N fq with highest hit ratio
> >> >> > >
> >> >> > --
> >> >> > Regards,
> >> >> > Binoy Dalal
> >> >> >
> >> >>
> >> > --
> >> > Regards,
> >> > Binoy Dalal
> >>
>
-- 
Regards,
Binoy Dalal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message