lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Facet ignoring repeated word
Date Tue, 10 May 2016 08:22:04 GMT
On Fri, 2016-04-29 at 08:55 +0000, G, Rajesh wrote:
> I am trying to implement word cloud<https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fwww.whitehouse.gov%2Fsites%2Fdefault%2Ffiles%2Fother%2Fsotu_wordle.png&imgrefurl=https%3A%2F%2Fwww.whitehouse.gov%2Fblog%2F2011%2F01%2F26%2Fstate-union-word-cloud-jobs-america-people-new&docid=eZ_HvQpd9FRBKM&tbnid=qyIc-elv6z-0iM%3A&w=895&h=406&bih=643&biw=1366&ved=0ahUKEwie_8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA&iact=mrc&uact=8>
 using Solr.  The problem I have is Solr facet query ignores repeated words in a document
eg.

Use a combination of faceting and stats:

1) Resolve candidate words with faceting, just as you have already done.

2) Create a stats-request with the same q as you used for faceting, with
a termfreq-function for each term in your facet result.


Working example from the techproducts-demo that comes with Solr:

http://localhost:8983/solr/techproducts/select
?q=name%3Addr%0A
&fl=name&wt=json&indent=true
&stats=true
&stats.field={!sum=true%20func}termfreq(%27name%27,%20%27ddr%27)
&stats.field={!sum=true%20func}termfreq(%27name%27,%20%271GB%27)

where 'name' is the field ('comments' in your setup) and 'ddr' and '1GB'
are two terms ('absorbed', 'am', 'believe' etc. in your setup).


The result will be something like

"response": {
    "numFound": 3,
...
"stats": {
    "stats_fields": {
      "termfreq('name', 'ddr')": {
        "sum": 6
      },
      "termfreq('name', '1GB')": {
        "sum": 3
      }
    }
  }


- Toke Eskildsen, State and University Library, Denmark



Mime
View raw message