lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: filter in JSON Query DSL
Date Mon, 30 Sep 2019 13:26:06 GMT
Jochen, right! Sorry for didn't get your point earlier.  {!bool filter=}
means Lucene filter, not Solr's one. I suppose {!bool cache=true} flag can
be easily added, but so far there is no laconic syntax for it. Don't
hesitate to raise a jira for it.

On Mon, Sep 30, 2019 at 3:18 PM Jochen Barth <barth@ub.uni-heidelberg.de>
wrote:

> Here the corrected equivalent query, giving the same results (and still
> much faster) as JsonQueryDSL:
>
> +filter(+((_query_:"{!graph from=parent_ids to=id }(meta_title_txt:muller
> meta_name_txt:muller meta_subject_txt:muller meta_shelflocator_txt:muller)"
> _query_:"{!graph from=id to=parent_ids  traversalFilter=\"class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal\"}(meta_title_txt:muller meta_name_txt:muller
> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
> text_abstract_ft:muller text_pdf_ft:muller)") ) +class_s:meta )
> -_query_:"{!join to=id from=parent_ids}(filter(+((_query_:\"{!graph
> from=parent_ids to=id }(meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller meta_shelflocator_txt:muller)\" _query_:\"{!graph
> from=id to=parent_ids  traversalFilter=\\\"class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal\\\"}(meta_title_txt:muller meta_name_txt:muller
> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
> text_abstract_ft:muller text_pdf_ft:muller)\") ) +class_s:meta ))"
>
> I am querying the "core" of the above query (the string before
> »-_query_:"{!join«) for faceting;
> than the next query is the one above [ like »+(a) -{!join...}(a)« ]
>
> Now the second query is running in much less time because the result of
> term "a" is cached.
>
> Caching seems not to work with {boolean=>{must=>"*:*", filter=>...}}.
>
> Kind regards,
> Jochen
>
>
>
>
>
>
> Am 30.09.19 um 11:02 schrieb Jochen Barth:
>
> Ooops... Json is returning 48652 docs, StandardQueryParser 827...
>
> Must check this.
>
> Sorry,
>
> Jochen
>
> Am 30.09.19 um 10:39 schrieb Jochen Barth:
>
> the *:* in JsonQueryDSL is appearing two times because of two times
> »filter(...)« in StandardQueryParser.
>
>
>
> I've did some System.out.println in FastLRU, LRU, LFUCache,
> here the logging with JsonQueryDSL (solr 8.1.1):
>
> Fast-get +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
> +class_s:meta) valLen=null
>
> Fast-get DocValuesFieldExistsQuery [field=id] valLen=38
>
> Fast-get DocValuesFieldExistsQuery [field=parent_ids] valLen=38
>
> Fast-put +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
> +class_s:meta)
>
> ...
>
> Fast(LRUCache)-get is called only once, but it should have been called 2
> Times:
> the first for finding out that this filter is not already cached and the
> second one for the identical part of the subquery.
>
>
> So now analzying Cache access with StandardQueryParser:
> Fast-get +(+[[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> +[[meta_title_txt:muller meta_name_txt
> :muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>  -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
> +class_s:meta valLen=null
> Fast-get DocValuesFieldExistsQuery [field=id] valLen=null
> Fast-put DocValuesFieldExistsQuery [field=id]
> Fast-get DocValuesFieldExistsQuery [field=parent_ids] valLen=null
> Fast-put DocValuesFieldExistsQuery [field=parent_ids]
> Fast-put +(+[[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> +[[meta_title_txt:muller meta_name_txt
> :muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>  -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
> +class_s:meta
> Fast-get +filter(+(+(+[[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> +[[meta_title_txt:muller met
> a_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: cl
> ass_s:meta -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
> +class_s:meta) valLen=null
> Fast-get +(+[[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> +[[meta_title_txt:muller meta_name_txt
> :muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>  -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
> +class_s:meta valLen=40
> Fast-put +filter(+(+(+[[meta_title_txt:muller meta_name_txt:muller
> meta_subject_txt:muller
> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
> +[[meta_title_txt:muller met
> a_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: cl
> ass_s:meta -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
> +class_s:meta)
>
> Two times Fast(LRUCache)-get +(+([[... as expected.
>
> Kind regards,
> Jochen
>
>
>
> Am 30.09.19 um 10:01 schrieb Jochen Barth:
>
> Dear Mikhail,
>
> maybe I am wrong,
>
> but this query (standardQueryParser):
>
> +filter(+((+((+(_query_:"{!graph from=parent_ids to=id
> }(meta_title_txt:muller meta_name_txt:muller meta_subject_txt:muller
> meta_shelflocator_txt:muller)") +(_query_:"{!graph from=id to=parent_ids
> traversalFilter=\"class_s:meta -type_s:multivolume_work -type_s:periodical
> -type_s:issue -type_s:journal\"}(meta_title_txt:muller meta_name_txt:muller
> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
> text_abstract_ft:muller text_pdf_ft:muller)"))))) +(class_s:meta))
> -(+(_query_:"{!join from=parent_ids
> to=id}(+filter(+((+((+(_query_:\"{!graph from=parent_ids to=id
> }(meta_title_txt:muller meta_name_txt:muller meta_subject_txt:muller
> meta_shelflocator_txt:muller)\") +(_query_:\"{!graph from=id to=parent_ids
> traversalFilter=\\\"class_s:meta -type_s:multivolume_work
> -type_s:periodical -type_s:issue -type_s:journal\\\"}(meta_title_txt:muller
> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller text_pdf_ft:muller)\")))))
> +(class_s:meta)))"))
>
> is as twice as fast as this equivalent one (JsonQueryDSL, "canonical" for
> stable key order):
>
> {"query":{"bool":{"filter":{"bool":{"must":[{"bool":{"should":[{"bool":{"should":[{"graph":{"from":"parent_ids","query":"meta_title_txt:muller
> meta_name_txt:muller meta_subject_txt:muller
> meta_shelflocator_txt:muller","to":"id"}},{"graph":{"from":"id","query":"meta_title_txt:muller
> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller","to":"parent_ids","traversalFilter":"class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal"}}]}}]}},"class_s:meta"]}},"must":"*:*","must_not":[{"join":{"from":"parent_ids","query":{"bool":{"filter":{"bool":{"must":[{"bool":{"should":[{"bool":{"should":[{"graph":{"from":"parent_ids","query":"meta_title_txt:muller
> meta_name_txt:muller meta_subject_txt:muller
> meta_shelflocator_txt:muller","to":"id"}},{"graph":{"from":"id","query":"meta_title_txt:muller
> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
> text_tei_ft:muller text_abstract_ft:muller
> text_pdf_ft:muller","to":"parent_ids","traversalFilter":"class_s:meta
> -type_s:multivolume_work -type_s:periodical -type_s:issue
> -type_s:journal"}}]}}]}},"class_s:meta"]}},"must":"*:*"}},"to":"id"}}]}}}
>
> Kind regards,
> Jochen
>
>
>
> Am 29.09.19 um 21:28 schrieb Mikhail Khludnev:
>
> On Sun, Sep 29, 2019 at 8:37 PM Barth, Jochen <Barth@ub.uni-heidelberg.de> <Barth@ub.uni-heidelberg.de>
> wrote:
>
>
> Thanks for your hint. The documentation does not say if the result of
> filter is cached here (like fq=...) (I could test this).
>
>
> 'filter' implies caching.
>
>
>
> Is *:* more expensive  (query time) than filter() (*:* not required in
> StandardQueryParser) ?
>
>
> I either doesn't get the question or it isn't worth to worry about.
>
>
>
> Kind regrads,
> Jochen
>
> ________________________________________
> Von: Mikhail Khludnev <mkhl@apache.org> <mkhl@apache.org>
> Gesendet: Samstag, 28. September 2019 22:58
> An: solr-user
> Betreff: Re: filter in JSON Query DSL
>
> Giving
> https://lucene.apache.org/solr/guide/8_0/other-parsers.html#boolean-query-parser
> something
> like
> '{"query": { "bool": { "must": ["*:*"] , "filter": [
> "meta_subject_txt:globe" ] } } }'
> I'm not sure why to put filter under must they should be siblings.
>
> On Fri, Sep 27, 2019 at 4:34 PM Jochen Barth <barth@ub.uni-heidelberg.de> <barth@ub.uni-heidelberg.de>
> wrote:
>
>
> Dear reader,
>
> this query works as expected:
>
> curl -XGET http://localhost:8982/solr/Suchindex/query -d '
> {"query": { "bool": { "must": "*:*" } },
> "filter": [ "meta_subject_txt:globe" ] }'
>
> this does not (nor without the curley braces around "filter"):
>
> curl -XGET http://localhost:8982/solr/Suchindex/query -d '
> {"query": { "bool": { "must": [ "*:*", { "filter": [
> "meta_subject_txt:globe" ] } ] } } }'
>
> Is "filter" within deeper queries possible?
>
> I've got some complex queries with a "kernel" somewhat below the top
> level...
>
> Is "canonical" json important to match query cache entry?
>
> Would it help to serialize this queries to standard syntax and then use
> filter(...)?
>
> Kind regards,
>
> Jochen
>
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221
> 54-2580
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>
> --
> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>
>

-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message