lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brett Hoerner <br...@bretthoerner.com>
Subject Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)
Date Tue, 03 Jun 2014 21:02:38 GMT
In this case, I have >400 million documents, so I understand it taking a
while.

That said, I'm still not sure I understand why it would take *more* time.
In your example above, wouldn't it have to create an 11.92MB bitset even if
I *don't* cache the bitset? It seems the mere act of storing the work after
it's done (it has to be done in either case) is taking 4 whole seconds?



On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey <solr@elyograg.org> wrote:

> On 6/3/2014 2:44 PM, Brett Hoerner wrote:
> > If I run a query like this,
> >
> > fq=text:lol
> > fq=created_at_tdid:[1400544000 TO 1400630400]
> >
> > It takes about 6 seconds. Following queries take only 50ms or less, as
> > expected because my fqs are cached.
> >
> > However, if I change the query to not cache my big range query:
> >
> > fq=text:lol
> > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400]
> >
> > It takes 2 seconds every time, which is a much better experience for my
> > "first query for that range."
> >
> > What's odd to me is that I would expect both of these (first) queries to
> > have to do the same amount of work, expect the first one stuffs the
> > resulting bitset into a map at the end... which seems to have a 4 second
> > overhead?
> >
> > Here's my filterCache from solrconfig:
> >
> >     <filterCache class="solr.FastLRUCache"
> >                  size="64"
> >                  initialSize="64"
> >                  autowarmCount="32"/>
>
> I think that probably depends on how many documents you have in the
> single index/shard.  If you have one hundred million documents stored in
> the Lucene index, then each filter entry is 12500000 bytes (11.92MB) in
> size - it is a bitset representing every document and whether it is
> included or excluded.  That data would need to be gathered and copied
> into the cache.  I suspect that it's the gathering that takes the most
> time ... several megabytes of memory is not very much for a modern
> processor to copy.
>
> As for how long this takes, I actually have no idea.  You have two
> filters here, so it would need to do everything twice.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message