lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From terhorst <terho...@gmail.com>
Subject Re: How does solr.StrField handle punctuation?
Date Wed, 18 Jun 2008 02:20:15 GMT

Here are the exact query strings I'm using. The only modification I made is
to change the output formatter from Ruby to XML and run the output through a
pretty printer.

This is the one that returns the facet.fields I'm interested in. The problem
field is the first one returned:

Query:
/solr/select/?facet=true&facet.mincount=1&facet.offset=0&facet.limit=22&wt=xml&rows=0&fl=*,score&start=0&facet.sort=true&q=division_t:%22Accounting%22;last_name_facet+asc&facet.field=company_facet&qt=standard&fq=in_redbook_b:true&debugQuery=true

Response:
<?xml version="1.0" encoding="UTF-8"?>
<response>
      <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">488</int>
            <lst name="params">
                  <str name="facet">true</str>
                  <str name="facet.offset">0</str>
                  <str name="facet.mincount">1</str>
                  <str name="facet.limit">22</str>
                  <str name="wt">xml</str>
                  <str name="rows">0</str>
                  <str name="fl">*,score</str>
                  <str name="debugQuery">true</str>
                  <str name="facet.sort">true</str>
                  <str name="start">0</str>
                  <str name="q">division_t:"Accounting";last_name_facet
asc</str>
                  <str name="facet.field">company_facet</str>
                  <str name="qt">standard</str>
                  <str name="fq">in_redbook_b:true</str>
            </lst>
      </lst>
      <result name="response" numFound="16508" start="0"
maxScore="4.144086"/>
      <lst name="facet_counts">
            <lst name="facet_queries"/>
            <lst name="facet_fields">
                  <lst name="company_facet">
                        <int name="Deloitte &amp;amp; Touche">4114</int>
                        <int name="Ernst &amp;amp; Young">1379</int>
                        <int name="PricewaterhouseCoopers">1257</int>
                        <int name="KPMG LLP">206</int>
                        <int name="Ernst &amp;amp; Young LLP">154</int>
                        <int name="Weiser LLP">134</int>
                        <int name="WithumSmith+Brown">86</int>
                        <int name="Eisner LLP">80</int>
                        <int name="Rothstein Kass">68</int>
                        <int name="Grant Thornton LLP">64</int>
                        <int name="RSM McGladrey Inc.">56</int>
                        <int name="Deloitte">49</int>
                        <int name="McGladrey &amp;amp; Pullen LLP">49</int>
                        <int name="J.H. Cohn LLP">45</int>
                        <int name="J. H. Cohn LLP">44</int>
                        <int name="Marks Paneth &amp;amp; Shron
LLP">42</int>
                        <int name="Amper, Politziner &amp;amp; Mattia
PC">41</int>
                        <int name="Marcum &amp;amp; Kliegman LLP">40</int>
                        <int name="Citrin Cooperman &amp;amp; Company
LLP">36</int>
                        <int name="Holtz Rubenstein Reminick LLP">36</int>
                        <int name="Mahoney Cohen &amp;amp; Company CPAs
P.C.">36</int>
                        <int name="D'Arcangelo &amp;amp; Company
LLP">35</int></lst></lst><lst name="facet_dates"/></lst><lst
name="debug"><str
name="rawquerystring">division_t:"Accounting";last_name_facet asc</str><str
name="querystring">division_t:"Accounting";last_name_facet asc</str><str
name="parsedquery">division_t:account</str><str
name="parsedquery_toString">division_t:account</str><lst
name="explain"/><arr
name="filter_queries"><str>in_redbook_b:true</str></arr><arr
name="parsed_filter_queries"><str>in_redbook_b:true</str></arr><lst
name="timing"><double name="time">488.0</double><lst name="prepare"><double
name="time">1.0</double><lst
name="org.apache.solr.handler.component.QueryComponent"><double
name="time">1.0</double></lst><lst
name="org.apache.solr.handler.component.FacetComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.HighlightComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.DebugComponent"><double
name="time">0.0</double></lst></lst><lst name="process"><double
name="time">487.0</double><lst
name="org.apache.solr.handler.component.QueryComponent"><double
name="time">1.0</double></lst><lst
name="org.apache.solr.handler.component.FacetComponent"><double
name="time">486.0</double></lst><lst
name="org.apache.solr.handler.component.MoreLikeThisComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.HighlightComponent"><double
name="time">0.0</double></lst><lst
name="org.apache.solr.handler.component.DebugComponent"><double
name="time">0.0</double></lst></lst></lst></lst></response>


---------------------------------------------------------------------

And then this is the one where I select the first facet.field returned
above, and attempt to pull up those results:


Query: 
/solr/select/?fl=*,score&start=0&wt=json&q=division_t:%22Accounting%22;last_name_facet+asc&qt=standard&fq=company_facet:%22Deloitte+%26+Touche%22&fq=in_redbook_b:true&rows=30&debugQuery=true

Response:
<?xml version="1.0" encoding="UTF-8"?>
<response>
      <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">1</int>
            <lst name="params">
                  <str name="fl">*,score</str>
                  <str name="debugQuery">true</str>
                  <str name="start">0</str>
                  <str name="q">division_t:"Accounting";last_name_facet
asc</str>
                  <str name="wt">xml</str>
                  <str name="qt">standard</str>
                  <arr name="fq">
                        <str>company_facet:"Deloitte &amp; Touche"</str>
                        <str>in_redbook_b:true</str>
                  </arr>
                  <str name="rows">30</str>
            </lst>
      </lst>
      <result name="response" numFound="0" start="0" maxScore="0.0"/>
      <lst name="debug">
            <str
name="rawquerystring">division_t:"Accounting";last_name_facet asc</str>
            <str name="querystring">division_t:"Accounting";last_name_facet
asc</str>
            <str name="parsedquery">division_t:account</str>
            <str name="parsedquery_toString">division_t:account</str>
            <lst name="explain"/>
            <arr name="filter_queries">
                  <str>company_facet:"Deloitte &amp; Touche"</str>
                  <str>in_redbook_b:true</str>
            </arr>
            <arr name="parsed_filter_queries">
                  <str>company_facet:Deloitte &amp; Touche</str>
                  <str>in_redbook_b:true</str>
            </arr>
            <lst name="timing">
                  <double name="time">1.0</double>
                  <lst name="prepare">
                        <double name="time">1.0</double>
                        <lst
name="org.apache.solr.handler.component.QueryComponent">
                              <double name="time">1.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.FacetComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.MoreLikeThisComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.HighlightComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.DebugComponent">
                              <double name="time">0.0</double>
                        </lst>
                  </lst>
                  <lst name="process">
                        <double name="time">0.0</double>
                        <lst
name="org.apache.solr.handler.component.QueryComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.FacetComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.MoreLikeThisComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.HighlightComponent">
                              <double name="time">0.0</double>
                        </lst>
                        <lst
name="org.apache.solr.handler.component.DebugComponent">
                              <double name="time">0.0</double>
                        </lst>
                  </lst>
            </lst>
      </lst>
</response>

(The other filter query, in_redbook_b, is a boolean field used to partition
our dataset. It should affect the results since it's in both queries.)

Thanks again for your help, I really appreciate your time.

Jonathan


hossman wrote:
> 
> 
> : Thanks for the reply. I was in a hurry and made the URL up to illustrate
> my
> : point. The real query string is more like what you suggest. In any case
> I'm
> : certain that the actual query being used is valid (Solr would complain
> if it
> : weren't) and that the ampersand is somehow affecting results. Is there
> any
> 
> no, actually it wouldn't complain in that case ... a URL param with a name 
> it's not expecting would just be ignored.
> 
> if you send us the exact URLs you'rehaving problems with there may be 
> other nuances about it that we can spot to help figure out your problem. 
> (for example: are you absolutely sure the apersand in your field value is 
> URL escaped?)
> 
> : way I can get Solr to dump some information about how it stores indexes,
> : keys, etc. for a certain record? I'm wondering if the ampersand was
> handled
> : in a weird way by my application when the records were added to the
> index.
> : (Although I doubt this since it shows up properly in the facets.) Thanks
> : again for your help.
> 
> yep, there are a couple of things you can do in general to 
> troubleshoot things like this...
> 
> 1) debugQuery=true ... add that param into your URL and Solr will give you 
> some nice debuging info about how your queries are bering parsed.  this is 
> important to post when asking followup questions.
> 
> 2) analysis.jsp ... this is the "Analysis" link on the admin page, it will 
> show you how your analyzer is treating the fields you index ... but this 
> isn'treally relevant to your specific problem since you are using 
> StrField.
> 
> 3) LukeRequestHandler, in the example schema it's mapped to /admin/luke 
> ... this will let you see the actual terms indexed for your fields ... but 
> this as you said, this isn't going to be much help for you in this 
> specific case since you used facet.field to get the value in the first
> place -- that means it's 
> definitely indexed that way.
> 
> debugQuery=true is definitely your best first step ... send us the exact 
> URLs your having problems with (that have debugQuery=true) along with the 
> full output of that URL and people can probably help spot your problem.
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-does-solr.StrField-handle-punctuation--tp17759824p17958425.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message