Return-Path: X-Original-To: apmail-lucene-solr-commits-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65AB7996B for ; Mon, 12 Mar 2012 13:56:52 +0000 (UTC) Received: (qmail 3002 invoked by uid 500); 12 Mar 2012 13:56:52 -0000 Delivered-To: apmail-lucene-solr-commits-archive@lucene.apache.org Received: (qmail 2948 invoked by uid 500); 12 Mar 2012 13:56:52 -0000 Mailing-List: contact solr-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-commits@lucene.apache.org Received: (qmail 2940 invoked by uid 99); 12 Mar 2012 13:56:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Mar 2012 13:56:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Mar 2012 13:56:48 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id C5BE9486; Mon, 12 Mar 2012 13:56:26 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Mon, 12 Mar 2012 13:56:26 -0000 Message-ID: <20120312135626.71222.74505@eos.apache.org> Subject: =?utf-8?q?=5BSolr_Wiki=5D_Update_of_=22ExtendedDisMax=22_by_JanHoydahl?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for chan= ge notification. The "ExtendedDisMax" page has been changed by JanHoydahl: http://wiki.apache.org/solr/ExtendedDisMax?action=3Ddiff&rev1=3D2&rev2=3D3 Comment: First real page for eDisMax - Placeholder page for describing Extended DisMax Qparser. Plan to copy the= [[DisMaxQParserPlugin]] page as a start... + [[Solr3.1]] The Extended DisMax Query Parser is a robust parser desig= ned to process advanced user input directly. It is built on the original Di= sMaxQParserPlugin but adds many features. It searches for the query words a= cross multiple fields with different boosts, based on the significance of e= ach field. Additional options let you influence the score based on rules sp= ecific to each use case (independent of user input). The DisMax page has m= ore background on the conceptual origins and behavior. + = + <> + = + =3D=3D Overview =3D=3D + The handler takes responsibility for building a good query from the user'= s input using !BooleanQueries containing !DisjunctionMaxQueries across fiel= ds and boosts you specify. It also lets you provide additional boosting qu= eries, boosting functions, and filtering queries. These options can all be = specified as default parameters for the handler in your solrconfig.xml or o= verridden in the Solr query URL. [[Solr3.6]] You can choose which field= s the end user is allowed to query, and choose to disallow direct fielded s= earches if wanted. + = + =3D=3D Query Syntax =3D=3D + This parser supports full Lucene !QueryParser syntax including boolean op= erators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, fuz= zy, grouping with parens, phrase search, phrase slop, numeric ranges, wildc= ard search and more. If there is a syntax error in the input, such as non-e= xisting field name or unbalanced double-quotes, the input is gracefully sea= rched as literal strings. + = + =3D=3D Query Structure =3D=3D + For each "word" in the query string, dismax builds a !DisjunctionMaxQuery= object for that word across all of the fields in the `qf` param (with the = appropriate boost values and a tiebreaker value set from the `tie` param). = These !DisjunctionMaxQuery objects are then put in a !BooleanQuery with th= e minNumberShouldMatch option set according to the `mm` param. If any othe= r params are specified, a larger !BooleanQuery is wrapped arround the first= !BooleanQuery from the `qf` options, and the other params (`bf`, `bq`, `pf= `) are added as optional clauses. The only complex clause comes from from = the `pf` param, which is a single !DisjuntionMaxQuery containing the whole = query 'phrase' against each of the `pf` fields. + = + /!\ :TODO: /!\ Need more detail on the query structure generated based on= input ... a picture would be nice. + = + =3D=3D Configuration =3D=3D + Extended !DisMax is already configured in the example schema, with the na= me *edismax*. Thus, to select the parser, use {{{defType=3Dedismax}}} in yo= ur query, or use the local-param syntax {!edismax} + = + =3D=3D Field aliasing / renaming =3D=3D + [[Solr3.6]] You may provide virtual alias fields for users to query. = This is useful either to provide a localized or easier name than what happe= ns to be in the schema, or to provide an alias for a group of fields to sup= port more advanced use cases such as 'what' and 'where' queries, even if th= ere are no physical 'what' or 'where' fields. + = + The syntax for aliasing is {{{f.myalias.qf=3Drealfield}}}. A user query f= or {{{myalias:foo}}} will be queried as {{{realfield:foo}}}. + = + The alias may also point to multiple fields, with weights. Let's imagine = you have a schema with fields {{{name, namealias, address, city, state}}}, = and you want to provide a 'who' and 'where' search. You could then configur= e aliases like this: {{{&f.who.qf=3Dname^5.0,namealias^2.0&f.where.qf=3Dadd= ress^1.0,city^10.0,state}}}. Any user query for {{{who:foo}}} would expand = to a DisMax query across fields name and namealias. If you further want to = hide the real field names, you can combine this with "User Fields" feature,= and say {{{&uf=3Dwho,where}}} to only allow fielded search for those two a= liases. + = + = + =3D=3D Parameters =3D=3D + The following parameters are supported, either as regular request params,= or as local params + = + =3D=3D=3D q.alt =3D=3D=3D + If specified, this query will be used (and parsed by default using standa= rd query parsing syntax) when the main query string is not specified or bla= nk. This comes in handy when you need something like a match-all-docs quer= y (don't forget &rows=3D0 for that one!) in order to get collection-wise fa= ceting counts. + = + =3D=3D=3D qf (Query Fields) =3D=3D=3D + List of fields and the "boosts" to associate with each of them when build= ing !DisjunctionMaxQueries from the user's query. The format supported is = {{{fieldOne^2.3 fieldTwo fieldThree^0.4}}}, which indicates that fieldOne h= as a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boo= st of 0.4 ... this indicates that matches in fieldOne are much more signifi= cant than matches in fieldTwo, which are more significant than matches in f= ieldThree. + = + =3D=3D=3D mm (Minimum 'Should' Match) =3D=3D=3D + When dealing with queries there are 3 types of "clauses" that Lucene know= s about: mandatory, prohibited, and 'optional' (aka: "SHOULD") By default = all words or phrases specified in the "q" param are treated as "optional" c= lauses unless they are preceeded by a "+" or a "-". When dealing with the= se "optional" clauses, the "mm" option makes it possible to say that a cert= ain minimum number of those clauses must match (mm). Specifying this minim= um number can be done in complex ways, equating to ideas like... + = + * At least 2 of the optional clauses must match, regardless of how many = clauses there are: "{{{2}}}" + * At least 75% of the optional clauses must match, rounded down: "{{{75%= }}}" + * If there are less than 3 optional clauses, they all must match; if the= re are 3 or more, then 75% must match, rounded up: "{{{2<-25%}}}" + * If there are less than 3 optional clauses, they all must match; for 3 = to 5 clauses, one less than the number of clauses must match, for 6 or more= clauses, 80% must match, rounded down: "{{{2<-1 5<80%}}}" + = + Full details on the variety of complex expressions supported are explaine= d in detail [[http://lucene.apache.org/solr/api/org/apache/solr/util/doc-fi= les/min-should-match.html|here]]. + = + From [[Solr4.0]] The default value of mm is dictated by the q.op para= m (q.op=3DAND =3D> mm=3D100%; q.op=3DOR =3D> mm=3D0%). Keep in mind the def= ault operator is effected by your schema.xml entry. In older versions of Solr the default value of 'mm' is = 100% (all clauses must match). + = + =3D=3D=3D qs (Query Phrase Slop) =3D=3D=3D + Amount of slop on phrase queries explicitly included in the user's query = string (in qf fields; affects matching). [[Solr1.3]] + = + =3D=3D=3D pf (Phrase Fields) =3D=3D=3D + Once the list of matching documents has been identified using the "fq" an= d "qf" params, the "pf" param can be used to "boost" the score of documents= in cases where all of the terms in the "q" param appear in close proximity. + = + The format is the same as the "qf" param: a list of fields and "boosts" t= o associate with each of them when making phrase queries out of the entire = "q" param. + = + =3D=3D=3D ps (Phrase Slop) =3D=3D=3D + Amount of slop on phrase queries built for "pf" fields (affects boosting). + = + =3D=3D=3D pf2 (Phrase bigram fields) =3D=3D=3D + As with 'pf' but chops the input into bi-grams, e.g. "the brown fox jumpe= d" is queried as "the brown" "brown fox" "fox jumped" + = + =3D=3D=3D ps2 (Phrase bigram slop) =3D=3D=3D + As with 'ps' but controls the slop factor for 'pf2' + = + =3D=3D=3D pf3 (Phrase trigram fields) =3D=3D=3D + As with 'pf' but chops the input into tri-grams, e.g. "the brown fox jump= ed" is queried as "the brown fox" "brown fox jumped" + = + =3D=3D=3D ps3 (Phrase trigram slop) =3D=3D=3D + As with 'ps' but controls the slop factor for 'pf2' + = + =3D=3D=3D tie (Tie breaker) =3D=3D=3D + Float value to use as tiebreaker in !DisjunctionMaxQueries (should be som= ething much less than 1) + = + When a term from the users input is tested against multiple fields, more = than one field may match and each field will generate a different score bas= ed on how common that word is in that field (for each document relative to = all other documents). By default the score from the field with the maximum = score is used. If two documents both have a matching score, the tie parame= ter has the effect of breaking the tie. + When a tie parameter is specified the scores from other matching fields a= re added to the score of the maximum scoring field: = + = + (score of matching clause with the highest score) + ( (tie paramenter) * = (scores of any other matching clauses) ) + = + The "tie" param let's you configure how much the final score of the query= will be influenced by the scores of the lower scoring fields compared to t= he highest scoring field. + = + A value of "0.0" makes the query a pure "disjunction max query" -- only t= he maximum scoring sub query contributes to the final score. A value of "1= .0" makes the query a pure "disjunction sum query" where it doesn't matter = what the maximum scoring sub query is, the final score is the sum of the su= b scores. Typically a low value (ie: 0.1) is useful. + = + =3D=3D=3D bq (Boost Query) =3D=3D=3D + A raw query string (in the SolrQuerySyntax) that will be included with th= e user's query to influence the score. If this is a !BooleanQuery with a d= efault boost (1.0f) then the individual clauses will be added directly to t= he main query. Otherwise, the query will be included as is. + = + /!\ :TODO: /!\ That latter part is deprecated behavior but still works. = It can be problematic so avoid it. + = + =3D=3D=3D bf (Boost Function, additive) =3D=3D=3D + [[FunctionQuery|Functions]] (with optional boosts) that will be included = in the user's query to influence the score. Any function supported nativel= y by Solr can be used, along with a boost value, e.g.: recip(rord(myfield),= 1,2,3)^1.5 + = + Specifying functions with the "bf" param is just shorthand for using the = {{{_val_:"...function..."}}} syntax in a "bq" param. + = + For example, if you want to show more recent documents first, use recip(m= s(NOW,mydatefield),3.16e-11,1,1). See FunctionQuery for more functions. + = + The bf parameter may be specified multiple times. + = + =3D=3D=3D boost (Boost Function, multiplicative) =3D=3D=3D + As for 'bf' but multiplies the boost into the score. + = + =3D=3D=3D uf (User Fields) =3D=3D=3D + Specifies which schema fields the end user shall be allowed to query for = explicitly. This parameter supports wildcards. + = + The default is to allow all fields, equivalent to {{{&uf=3D*}}}. To allow= only title field, use {{{&uf=3Dtitle}}}, to allow title and all fields end= ing with _s, use {{{&uf=3Dtitle *_s}}}. To allow all fields except title, u= se {{{&uf=3D* -title}}} + = + =3D=3D Examples =3D=3D + /!\ :TODO: /!\ cleanup and expand examples + = + Search across multiple fields, specifying (via boosts) how important each= field is relative each other + = + {{{ + http://localhost:8983/solr/select/?q=3Dvideo&defType=3Dedismax&qf=3Dfeatu= res^20.0+text^0.3 + }}} + You can boost results that have a field that matches a specific value... + = + {{{ + http://localhost:8983/solr/select/?q=3Dvideo&defType=3Dedismax&qf=3Dfeatu= res^20.0+text^0.3&bq=3Dcat:electronics^5.0 + }}} + Using the "mm" param, 1 and 2 word queries require that all of the option= al clauses match, but for queries with three or more clauses one missing cl= ause is allowed... + = + {{{ + http://localhost:8983/solr/select/?q=3Dbelkin+ipod&defType=3Dedismax&mm= =3D2 + http://localhost:8983/solr/select/?q=3Dbelkin+ipod+gibberish&defType=3Ded= ismax&mm=3D2 + http://localhost:8983/solr/select/?q=3Dbelkin+ipod+apple&defType=3Dedisma= x&mm=3D2 + }}} + = = =3D=3D References =3D=3D * [[https://issues.apache.org/jira/browse/SOLR-2368|SOLR-2368]] tracks im= provements to eDisMax