lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <erik.hatc...@gmail.com>
Subject Re: Long string in fq value parameter, more than 2000000 chars
Date Sat, 27 May 2017 20:56:59 GMT
Another technique to consider is {!join}.  Index the cross ref id "sets" to another core and
use a short and sweet join, if there are stable sets of id's. 

   Erik

> On May 27, 2017, at 11:39, Alexandre Rafalovitch <arafalov@gmail.com> wrote:
> 
> On top of Shawn's analysis, I am also wondering how often those FQ
> queries are reused. Because they and the matching documents are
> getting cached, so there might be quite a bit of space taken with that
> too.
> 
> Regards,
>    Alex.
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
> 
> 
>> On 27 May 2017 at 11:32, Shawn Heisey <apache@elyograg.org> wrote:
>>> On 5/27/2017 9:05 AM, Shawn Heisey wrote:
>>>> On 5/27/2017 7:14 AM, Daniel Angelov wrote:
>>>> I would like to ask, what could be the memory/cpu impact, if the fq
>>>> parameter in many of the queries is a long string (fq={!terms
>>>> f=...}...,.... ) around 2000000 chars. Most of the queries are like:
>>>> "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria".
>>>> This is with SolrCloud 4.1, on 10 hosts, 3 collections, summary in
>>>> all collections are around 10000000 docs. The queries are over all 3
>>>> collections.
>> 
>> Followup after a little more thought:
>> 
>> If we assume that the terms in your filter query are a generous 15
>> characters each (plus a comma), that means there are in the ballpark of
>> 125 thousand of them in a two million byte filter query.  If they're
>> smaller, then there would be more.  Considering 56 bytes of overhead for
>> each one, there's at least another 7 million bytes of memory for 125000
>> terms when the terms parser divides that filter into multiple String
>> objects, plus memory required for the data in each of those small
>> strings, which will be just a little bit less than the original four
>> million bytes, because it will exclude the commas.  A fair amount of
>> garbage will probably also be generated in order to parse the filter ...
>> and then once the query is done, the 15 megabytes (or more) of memory
>> for the strings will also be garbage.  This is going to repeat for every
>> shard.
>> 
>> I haven't even discussed what happens for memory requirements on the
>> Lucene frange parser, because I don't have any idea what those are, and
>> you didn't describe the function you're using.  I also don't know how
>> much memory Lucene is going to require in order to execute a terms
>> filter with at least 125K terms.  I don't imagine it's going to be small.
>> 
>> Thanks,
>> Shawn
>> 

Mime
View raw message