lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Re[2]: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)
Date Tue, 29 Mar 2016 17:40:39 GMT
Alisa,

There is no such thing as child.facet.limit, etc

On Tue, Mar 29, 2016 at 6:27 PM, Alisa Z. <proloxx@mail.ru> wrote:

>  So the first issue eventually solved by adding facet: {top_terms_by_doc:
> "unique(_root_)"} AND sorting the outer facet buckets by this faceting:
>
> curl http://localhost:8985/solr/enron_path_w_ts/query -d
> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
> json.facet={
>   filter_by_child_type :{
>     type:query,
>     q:"type_s:doc.enriched.text.keywords",
>     domain: { blockChildren : "type_s:doc" },
>     facet:{
>       top_keywords_text : {
>         type: terms,
>         field: text_t,
>         limit: 10,
>         sort: "top_terms_by_doc desc",
>          facet: {
>            top_terms_by_doc: "unique(_root_)"
>          }
>       }
>     }
>   }
> }'
>
>
> The  BlockJoin Faceting  part is still open:  I've tried all conventional
> faceting parameters:  facet.limit, child.facet.limit, f.text_t.facet.limit
> ... nothing worked :(
>
>
> >Понедельник, 28 марта 2016, 17:20 -04:00 от Alisa Z. <proloxx@mail.ru>:
> >
> >Ok, so for the 1st question, I think I'm getting closer:  adding  facet:
> {top_terms_by_doc: "unique(_root_)"}  as indicated in
> http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns
> correct counts. However, sorting is done by the upper faceting not by the
> unique(_root_):
> >
> >
> >curl  http://localhost:8985/solr/my_collection /query -d
> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
> >json.facet={
> >  filter_by_child_type :{
> >    type:query,
> >    q:"type_s:doc.enriched.text.keywords",
> >    domain: { blockChildren : "type_s:doc" },
> >    facet:{
> >      top_keywords_text : {
> >        type: terms,
> >        field: text_t,
> >        limit: 10,
> >        facet: {
> >           top_terms_by_doc: "unique(_root_)"
> >         }
> >      }
> >    }
> >  }
> >}'
> >
> >RETURNS
> >
> >{
> >  "responseHeader":{
> >    "status":0,
> >    "QTime":25,
> >    "params":{
> >      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
> +Subject_t:california",
> >      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n
> q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren :
> \"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type:
> terms,\n        field: text_t,\n        limit: 10,\n        facet:
> {\n           top_terms_by_doc: \"unique(_root_)\"\n         }\n
> }\n    }\n  }\n}",
> >      "rows":"0"}},
> >  "response":{"numFound":19,"start":0,"docs":[]
> >  },
> >  "facets":{
> >    "count":19,
> >    "filter_by_child_type":{
> >      "count":686,
> >      "top_keywords_text":{
> >        "buckets":[{
> >            "val":"enron",
> >            "count":57,
> >            "top_terms_by_doc":9},
> >          {
> >            "val":"california",
> >            "count":22,
> >            "top_terms_by_doc":13},
> >          {
> >            "val":"power",
> >            "count":21,
> >            "top_terms_by_doc":7},
> >          {
> >            "val":"rate",
> >            "count":15,
> >            "top_terms_by_doc":5},
> >          {
> >            "val":"plan",
> >            "count":13,
> >            "top_terms_by_doc":3},
> >          {
> >            "val":"hou",
> >            "count":12,
> >            "top_terms_by_doc":5},
> >          {
> >            "val":"energy",
> >            "count":11,
> >            "top_terms_by_doc":5},
> >          {
> >            "val":"na",
> >            "count":11,
> >            "top_terms_by_doc":5},
> >          {
> >            "val":"mckinsey",
> >            "count":10,
> >            "top_terms_by_doc":1},
> >          {
> >            "val":"socal",
> >            "count":10,
> >            "top_terms_by_doc":4}]}}}}
> >
> >Nice, but I want them to be ordered by "top_terms_by_doc" frequencies,
> not by the "count" frequencies.
> >Any suggestions?
> >
> >Thanks,
> >Alisa
> >
> >
> >
> >
> >
> >>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. < proloxx@mail.ru
> >:
> >>
> >>Hi all,
> >>
> >>I am trying to perform faceting of parent docs by nested document
> fields. I've tried 2 approaches as in subject, yet in first the results are
> not quite correct and in the 2nd I cannot get the query right. So I need
> help on either of them and any explication or documentation or blogs on the
> behavior is much appreciated.
> >>
> >>Verbally the query is as follows: "Find top 10 keywords for all
> documents with "california" in email subject line"
> >>
> >>Here is the query with responses:
> >>
> >>==== Json Facet API ====
> >>
> >>curl http://localhost:8985/solr/my_collection/query -d
> 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
> >>json.facet={
> >>  filter_by_child_type :{
> >>    type:query,
> >>    q:"type_s:doc.enriched.text.keywords",
> >>    domain: { blockChildren : "type_s:doc" },
> >>    facet:{
> >>      top_keywords_text : {
> >>        type: terms,
> >>        field: text_t,
> >>        limit: 10
> >>      }
> >>    }
> >>  }
> >>}'
> >>
> >>RETURNS:
> >>
> >>{
> >>  "responseHeader":{
> >>    "status":0,
> >>    "QTime":134,
> >>    "params":{
> >>      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData
> +Subject_t:california",
> >>      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n
> q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren :
> \"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type:
> terms,\n        field: text_t,\n        limit: 10\n      }\n    }\n  }\n}",
> >>      "rows":"0"}},
> >>  "response":{"numFound":19,"start":0,"docs":[]
> >>  },
> >>  "facets":{
> >>    "count":19,
> >>    "filter_by_child_type":{
> >>      "count":686,
> >>      "top_keywords_text":{
> >>        "buckets":[{
> >>            "val":"enron",
> >>            "count":57},
> >>          {
> >>            "val":"california",
> >>            "count":22},
> >>          {
> >>            "val":"power",
> >>            "count":21},
> >>          {
> >>            "val":"rate",
> >>            "count":15},
> >>          {
> >>            "val":"plan",
> >>            "count":13},
> >>          {
> >>            "val":"hou",
> >>            "count":12},
> >>          {
> >>            "val":"energy",
> >>            "count":11},
> >>          {
> >>            "val":"na",
> >>            "count":11},
> >>          {
> >>            "val":"mckinsey",
> >>            "count":10},
> >>          {
> >>            "val":"socal",
> >>            "count":10}]}}}}
> >>
> >>
> >>QUESTION:  where do the counts greater than 19 (the total number of the
> top-level documents returned by the query) comes from?  How to adjust the
> query to facet only on the top-level documents (and consequently no count
> should be greater than 19)?
> >>
> >>
> >>===== BlockJoin Faceting ======
> >>Following the example on
> https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting ,
> I've tried this:
> >>
>
> >>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true
> >>
> >>RETURNS:
> >>
> >>{
> >>  "responseHeader":{
> >>    "status":0,
> >>    "QTime":1},
> >>  "response":{"numFound":19,"start":0,"docs":[]
> >>  },
> >>  "facet_counts":[
> >>    "facet_fields",[
> >>      "text_t",[
> >>        "128x",1,
> >>        "18xx",1,
> >>        "1x",1,
> >>        "2",2,
> >>        "30",1,
> >>        "60",1,
> >>        "78xx",1,
> >>        "82xx",1,
> >>        "ab",2,
> >>        "access",5,
> >>        "account",1,
> >>        "accounts",1,
> >>...
> >>"california",13,
> >>...
> >>"enron",9,
> >>...
> >>]]]}
> >>
> >>QUESTION: This looks very close to what I want, yet why
> child.facet.limit=10&child.facet.mincount=5 are ignored?  How to get top 10
> most frequent?
> >>
> >>
> >>Thank you for your help in advance!
> >>
> >>--
> >>Alisa Zhila
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message