lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brett Hoerner <br...@bretthoerner.com>
Subject Re: Is the act of *caching* an fq very expensive? (seems to cost 4 seconds in my example)
Date Tue, 03 Jun 2014 21:06:58 GMT
This is seemingly where it checks whether to use cache or not, the extra
work is really just a get (miss) and a put:


https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1216

I suppose it's possible the put is taking 4 seconds, but that seems...
surprising to me.


On Tue, Jun 3, 2014 at 4:02 PM, Brett Hoerner <brett@bretthoerner.com>
wrote:

> In this case, I have >400 million documents, so I understand it taking a
> while.
>
> That said, I'm still not sure I understand why it would take *more* time.
> In your example above, wouldn't it have to create an 11.92MB bitset even if
> I *don't* cache the bitset? It seems the mere act of storing the work after
> it's done (it has to be done in either case) is taking 4 whole seconds?
>
>
>
> On Tue, Jun 3, 2014 at 3:59 PM, Shawn Heisey <solr@elyograg.org> wrote:
>
>> On 6/3/2014 2:44 PM, Brett Hoerner wrote:
>> > If I run a query like this,
>> >
>> > fq=text:lol
>> > fq=created_at_tdid:[1400544000 TO 1400630400]
>> >
>> > It takes about 6 seconds. Following queries take only 50ms or less, as
>> > expected because my fqs are cached.
>> >
>> > However, if I change the query to not cache my big range query:
>> >
>> > fq=text:lol
>> > fq={!cache=false}created_at_tdid:[1400544000 TO 1400630400]
>> >
>> > It takes 2 seconds every time, which is a much better experience for my
>> > "first query for that range."
>> >
>> > What's odd to me is that I would expect both of these (first) queries to
>> > have to do the same amount of work, expect the first one stuffs the
>> > resulting bitset into a map at the end... which seems to have a 4 second
>> > overhead?
>> >
>> > Here's my filterCache from solrconfig:
>> >
>> >     <filterCache class="solr.FastLRUCache"
>> >                  size="64"
>> >                  initialSize="64"
>> >                  autowarmCount="32"/>
>>
>> I think that probably depends on how many documents you have in the
>> single index/shard.  If you have one hundred million documents stored in
>> the Lucene index, then each filter entry is 12500000 bytes (11.92MB) in
>> size - it is a bitset representing every document and whether it is
>> included or excluded.  That data would need to be gathered and copied
>> into the cache.  I suspect that it's the gathering that takes the most
>> time ... several megabytes of memory is not very much for a modern
>> processor to copy.
>>
>> As for how long this takes, I actually have no idea.  You have two
>> filters here, so it would need to do everything twice.
>>
>> Thanks,
>> Shawn
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message