lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alisa Z. <prol...@mail.ru>
Subject [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)
Date Mon, 28 Mar 2016 19:39:07 GMT
 Hi all, 

I am trying to perform faceting of parent docs by nested document fields. I've tried 2 approaches
as in subject, yet in first the results are not quite correct and in the 2nd I cannot get
the query right. So I need help on either of them and any explication or documentation or
blogs on the behavior is much appreciated.   

Verbally the query is as follows: "Find top 10 keywords for all documents with "california"
in email subject line"

Here is the query with responses: 

==== Json Facet API ====  

curl http://localhost:8985/solr/my_collection/query -d 'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"type_s:doc.enriched.text.keywords",
    domain: { blockChildren : "type_s:doc" },
    facet:{
      top_keywords_text : {
        type: terms,
        field: text_t,
        limit: 10
      }
    }
  }
}'

RETURNS:  

{
  "responseHeader":{
    "status":0,
    "QTime":134,
    "params":{
      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData +Subject_t:california",
      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    q:\"type_s:doc.enriched.text.keywords\",\n   
domain: { blockChildren : \"type_s:doc\" },\n    facet:{\n      top_keywords_text
: {\n        type: terms,\n        field: text_t,\n        limit: 10\n     
}\n    }\n  }\n}",
      "rows":"0"}},
  "response":{"numFound":19,"start":0,"docs":[]
  },
  "facets":{
    "count":19,
    "filter_by_child_type":{
      "count":686,
      "top_keywords_text":{
        "buckets":[{
            "val":"enron",
            "count":57},
          {
            "val":"california",
            "count":22},
          {
            "val":"power",
            "count":21},
          {
            "val":"rate",
            "count":15},
          {
            "val":"plan",
            "count":13},
          {
            "val":"hou",
            "count":12},
          {
            "val":"energy",
            "count":11},
          {
            "val":"na",
            "count":11},
          {
            "val":"mckinsey",
            "count":10},
          {
            "val":"socal",
            "count":10}]}}}}


QUESTION:  where do the counts greater than 19 (the total number of the top-level documents
returned by the query) comes from?  How to adjust the query to facet only on the top-level
documents (and consequently no count should be greater than 19)? 


===== BlockJoin Faceting ====== 
Following the example on  https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting
, I've tried this:  

/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true

RETURNS: 

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "response":{"numFound":19,"start":0,"docs":[]
  },
  "facet_counts":[
    "facet_fields",[
      "text_t",[
        "128x",1,
        "18xx",1,
        "1x",1,
        "2",2,
        "30",1,
        "60",1,
        "78xx",1,
        "82xx",1,
        "ab",2,
        "access",5,
        "account",1,
        "accounts",1,
...
"california",13,
...
"enron",9,
...
]]]}

QUESTION: This looks very close to what I want, yet why  child.facet.limit=10&child.facet.mincount=5
are ignored?  How to get top 10 most frequent? 


Thank you for your help in advance! 

-- 
Alisa Zhila
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message