Return-Path: X-Original-To: apmail-lucene-commits-archive@www.apache.org Delivered-To: apmail-lucene-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44EA0192F3 for ; Tue, 1 Mar 2016 22:27:02 +0000 (UTC) Received: (qmail 2236 invoked by uid 500); 1 Mar 2016 22:27:02 -0000 Mailing-List: contact commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list commits@lucene.apache.org Received: (qmail 2225 invoked by uid 99); 1 Mar 2016 22:27:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2016 22:27:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AD59FC0185 for ; Tue, 1 Mar 2016 22:27:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.681 X-Spam-Level: X-Spam-Status: No, score=0.681 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.329, T_FILL_THIS_FORM_SHORT=0.01] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 4PQtuLRb-9qo for ; Tue, 1 Mar 2016 22:26:59 +0000 (UTC) Received: from eos.apache.org (eos.apache.org [140.211.11.131]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTP id 9B9135FBDB for ; Tue, 1 Mar 2016 22:26:58 +0000 (UTC) Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 1FA4D1D8; Tue, 1 Mar 2016 22:26:57 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 01 Mar 2016 22:26:57 -0000 Message-ID: <20160301222657.96747.88703@eos.apache.org> Subject: =?utf-8?q?=5BSolr_Wiki=5D_Update_of_=22ExtendedDisMax=22_by_JanHoydahl?= Auto-Submitted: auto-generated Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for chan= ge notification. The "ExtendedDisMax" page has been changed by JanHoydahl: https://wiki.apache.org/solr/ExtendedDisMax?action=3Ddiff&rev1=3D17&rev2=3D= 18 Comment: Point to RefGuide - [[Solr3.1]] The Extended DisMax Query Parser (eDisMax) is a robust pa= rser designed to process advanced user input directly. It is built on the o= riginal DisMaxQParserPlugin but adds many features. It searches for the que= ry words across multiple fields with different boosts, based on the signifi= cance of each field. Additional options let you influence the score based o= n rules specific to each use case (independent of user input). The DisMax = page has more background on the conceptual origins and behavior. + This documentation has moved to the official Reference Guide: https://cwi= ki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser. = - <> + The reference guide is also available in PDF format for each specific Sol= r release, see https://archive.apache.org/dist/lucene/solr/ref-guide/ = - =3D=3D Overview =3D=3D - The parser takes responsibility for building a good query from the user's= input using !BooleanQueries containing !DisjunctionMaxQueries across field= s and boosts you specify. It also lets you provide additional boosting que= ries, boosting functions, and filtering queries. These options can all be s= pecified as default parameters for the handler in your solrconfig.xml or ov= erridden in the Solr query URL. [[Solr3.6]] You can choose which fields= the end user is allowed to query, and choose to disallow direct fielded se= arches if wanted. - = - =3D=3D Query Syntax =3D=3D - This parser supports full Lucene !QueryParser syntax including boolean op= erators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, fuz= zy, grouping with parens, phrase search, phrase slop, numeric ranges, wildc= ard search and more. If there is a syntax error in the input, such as non-e= xisting field name or unbalanced double-quotes, the input is gracefully sea= rched as literal strings. - = - =3D=3D Query Structure =3D=3D - For each "word" in the query string, dismax builds a !DisjunctionMaxQuery= object for that word across all of the fields in the `qf` param (with the = appropriate boost values and a tiebreaker value set from the `tie` param). = These !DisjunctionMaxQuery objects are then put in a !BooleanQuery with th= e minNumberShouldMatch option set according to the `mm` param. If any othe= r params are specified, a larger !BooleanQuery is wrapped arround the first= !BooleanQuery from the `qf` options, and the other params (`bf`, `bq`, `pf= `, `pf2`, `pf3`, 'ps2', 'ps3') are added as optional clauses. The only com= plex clause comes from from the `pf` param, which is a single !DisjuntionMa= xQuery containing the whole query 'phrase' against each of the `pf` fields. - = - /!\ :TODO: /!\ Need more detail on the query structure generated based on= input ... a picture would be nice. - = - =3D=3D Configuration =3D=3D - Extended !DisMax is already configured in the example configuration, with= the name '''edismax'''. Thus, to select the parser, use {{{defType=3Dedism= ax}}} in your query, or use the local-param syntax {!edismax} - = - =3D=3D Field aliasing / renaming =3D=3D - [[Solr3.6]] You may provide virtual alias fields for users to query. = This is useful either to provide a localized or easier name than what happe= ns to be in the schema, or to provide an alias for a group of fields to sup= port more advanced use cases such as 'what' and 'where' queries, even if th= ere are no physical 'what' or 'where' fields. - = - The syntax for aliasing is {{{f.myalias.qf=3Drealfield}}}. A user query f= or {{{myalias:foo}}} will be queried as {{{realfield:foo}}}. - = - The alias may also refer to multiple fields, with boost factors, by listi= ng the field names with a space between them, and the optional boost factor= immediately following the field name and the caret ('{{{^}}}') operator. L= et's imagine you have a schema with fields {{{name, namealias, address, cit= y, state}}}, and you want to provide a 'who' and 'where' search. You could = then configure aliases like this: {{{&f.who.qf=3Dname^5.0+namealias^2.0&f.w= here.qf=3Daddress^1.0+city^10.0+state}}}. Any user query for {{{who:foo}}} = would expand to a DisMax query across fields name and namealias. If you fur= ther want to hide the real field names, you can combine this with "User Fie= lds" feature, and say {{{&uf=3Dwho,where}}} to only allow fielded search fo= r those two aliases. - = - = - =3D=3D Parameters =3D=3D - The following parameters are supported, either as regular request params,= or as local params - = - =3D=3D=3D q.alt =3D=3D=3D - If specified, this query will be used (and parsed by default using standa= rd query parsing syntax) when the main query string is not specified or bla= nk. This comes in handy when you need something like a match-all-docs quer= y (don't forget &rows=3D0 for that one!) in order to get collection-wise fa= ceting counts. - = - =3D=3D=3D qf (Query Fields) =3D=3D=3D - List of fields and the "boosts" to associate with each of them when build= ing !DisjunctionMaxQueries from the user's query. The format supported is = {{{fieldOne^2.3 fieldTwo fieldThree^0.4}}}, which indicates that fieldOne h= as a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boo= st of 0.4 ... this indicates that matches in fieldOne are much more signifi= cant than matches in fieldTwo, which are more significant than matches in f= ieldThree. - = - =3D=3D=3D mm (Minimum 'Should' Match) =3D=3D=3D - When dealing with queries there are 3 types of "clauses" that Lucene know= s about: mandatory, prohibited, and 'optional' (aka: "SHOULD") By default = all words or phrases specified in the "q" param are treated as "optional" c= lauses unless they are preceeded by a "+" or a "-". When dealing with the= se "optional" clauses, the "mm" option makes it possible to say that a cert= ain minimum number of those clauses must match (mm). Specifying this minim= um number can be done in complex ways, equating to ideas like... - = - * At least 2 of the optional clauses must match, regardless of how many = clauses there are: "{{{2}}}" - * At least 75% of the optional clauses must match, rounded down: "{{{75%= }}}" - * If there are less than 3 optional clauses, they all must match; if the= re are 3 or more, then 75% must match, rounded up: "{{{2<-25%}}}" - * If there are less than 3 optional clauses, they all must match; for 3 = to 5 clauses, one less than the number of clauses must match, for 6 or more= clauses, 80% must match, rounded down: "{{{2<-1 5<80%}}}" - = - Full details on the variety of complex expressions supported are explaine= d in detail [[http://lucene.apache.org/solr/api/org/apache/solr/util/doc-fi= les/min-should-match.html|here]]. - = - From [[Solr4.0]] The default value of mm is dictated by the q.op para= m (q.op=3DAND =3D> mm=3D100%; q.op=3DOR =3D> mm=3D0%). Keep in mind the def= ault operator is effected by your schema.xml entry. In older versions of Solr the default value of 'mm' is = 100% (all clauses must match). - = - =3D=3D=3D qs (Query Phrase Slop) =3D=3D=3D - Amount of slop on phrase queries explicitly included in the user's query = string (in qf fields; affects matching). [[Solr1.3]] - = - =3D=3D=3D pf (Phrase Fields) =3D=3D=3D - Once the list of matching documents has been identified using the "fq" an= d "qf" params, the "pf" param can be used to "boost" the score of documents= in cases where all of the terms in the "q" param appear in close proximity. - = - The format is the same as the "qf" param: a list of fields and "boosts" t= o associate with each of them when making phrase queries out of the entire = "q" param. - = - [[Solr4.0]] You can also specify an optional slop factor directly in = "pf" with the syntax {{{field~slop}}}. To specify both a slop and a boost, = use {{{field~slop^boost}}}. Example: {{{title~2^10.0}}} will use the title = field with a phrase slop of 2 and a boost of 10.0. A phrase slop specified = here overrides the default specified in "ps". See SOLR-2058. - = - =3D=3D=3D ps (Phrase Slop) =3D=3D=3D - Default amount of slop on phrase queries built with "pf", "pf2" and/or "p= f3" fields (affects boosting). - = - =3D=3D=3D pf2 (Phrase bigram fields) =3D=3D=3D - As with 'pf' but chops the input into bi-grams, e.g. "the brown fox jumpe= d" is queried as "the brown" "brown fox" "fox jumped" - = - =3D=3D=3D ps2 (Phrase bigram slop) =3D=3D=3D - = - [[Solr4.0]] As with 'ps' but sets default slop factor for 'pf2'. If n= ot specified, 'ps' will be used. - = - =3D=3D=3D pf3 (Phrase trigram fields) =3D=3D=3D - As with 'pf' but chops the input into tri-grams, e.g. "the brown fox jump= ed" is queried as "the brown fox" "brown fox jumped" - = - =3D=3D=3D ps3 (Phrase trigram slop) =3D=3D=3D - = - [[Solr4.0]] As with 'ps' but sets default slop factor for 'pf3'. If n= ot specified, 'ps' will be used. - = - =3D=3D=3D tie (Tie breaker) =3D=3D=3D - Float value to use as tiebreaker in !DisjunctionMaxQueries (should be som= ething much less than 1) - = - When a term from the users input is tested against multiple fields, more = than one field may match and each field will generate a different score bas= ed on how common that word is in that field (for each document relative to = all other documents). By default the score from the field with the maximum = score is used. If two documents both have a matching score, the tie parame= ter has the effect of breaking the tie. - When a tie parameter is specified the scores from other matching fields a= re added to the score of the maximum scoring field: = - = - (score of matching clause with the highest score) + ( (tie paramenter) * = (scores of any other matching clauses) ) - = - The "tie" param let's you configure how much the final score of the query= will be influenced by the scores of the lower scoring fields compared to t= he highest scoring field. - = - A value of "0.0" makes the query a pure "disjunction max query" -- only t= he maximum scoring sub query contributes to the final score. A value of "1= .0" makes the query a pure "disjunction sum query" where it doesn't matter = what the maximum scoring sub query is, the final score is the sum of the su= b scores. Typically a low value (ie: 0.1) is useful. - = - =3D=3D=3D bq (Boost Query) =3D=3D=3D - A raw query string (in the SolrQuerySyntax) that will be included with th= e user's query to influence the score. If this is a !BooleanQuery with a d= efault boost (1.0f) then the individual clauses will be added directly to t= he main query. Otherwise, the query will be included as is. - = - /!\ :TODO: /!\ That latter part is deprecated behavior but still works. = It can be problematic so avoid it. - = - =3D=3D=3D bf (Boost Function, additive) =3D=3D=3D - [[FunctionQuery|Functions]] (with optional boosts) that will be included = in the user's query to influence the score. Any function supported nativel= y by Solr can be used, along with a boost value, e.g.: recip(rord(myfield),= 1,2,3)^1.5 - = - Specifying functions with the "bf" param is just shorthand for using the = {{{_val_:"...function..."}}} syntax in a "bq" param. - = - For example, if you want to show more recent documents first, use recip(m= s(NOW,mydatefield),3.16e-11,1,1). See FunctionQuery for more functions. - = - The bf parameter may be specified multiple times. - = - =3D=3D=3D boost (Boost Function, multiplicative) =3D=3D=3D - As for 'bf' but multiplies the boost into the score. - = - =3D=3D=3D uf (User Fields) =3D=3D=3D - Specifies which schema fields the end user shall be allowed to query for = explicitly. This parameter supports wildcards. - = - The default is to allow all fields, equivalent to {{{&uf=3D*}}}. To allow= only title field, use {{{&uf=3Dtitle}}}, to allow title and all fields end= ing with _s, use {{{&uf=3Dtitle *_s}}}. To allow all fields except title, u= se {{{&uf=3D* -title}}}. To disallow all fielded searches, use {{{&uf=3D-*}= }}. - = - The uf parameter was introduced in [[Solr3.6]] - = - =3D=3D=3D lowercaseOperators =3D=3D=3D - This param controls whether to try to interpret lowercase words as boolea= n operators such as "and" and "or". Set {{{&lowercaseOperators=3Dtrue}}} to= allow this. Default is "true". - = - Please see [[https://issues.apache.org/jira/browse/SOLR-3580|SOLR-3580]] = for patches to enabling lowercase "not" operator support. - = - =3D=3D Examples =3D=3D - /!\ :TODO: /!\ cleanup and expand examples - = - Search across multiple fields, specifying (via boosts) how important each= field is relative each other - = - {{{ - http://localhost:8983/solr/select/?q=3Dvideo&defType=3Dedismax&qf=3Dfeatu= res^20.0+text^0.3 - }}} - You can boost results that have a field that matches a specific value... - = - {{{ - http://localhost:8983/solr/select/?q=3Dvideo&defType=3Dedismax&qf=3Dfeatu= res^20.0+text^0.3&bq=3Dcat:electronics^5.0 - }}} - Using the "mm" param, 1 and 2 word queries require that all of the option= al clauses match, but for queries with three or more clauses one missing cl= ause is allowed... - = - {{{ - http://localhost:8983/solr/select/?q=3Dbelkin+ipod&defType=3Dedismax&mm= =3D2 - http://localhost:8983/solr/select/?q=3Dbelkin+ipod+gibberish&defType=3Ded= ismax&mm=3D2 - http://localhost:8983/solr/select/?q=3Dbelkin+ipod+apple&defType=3Dedisma= x&mm=3D2 - }}} - = - = - =3D=3D References =3D=3D - * [[https://issues.apache.org/jira/browse/SOLR-2368|SOLR-2368]] tracks im= provements to eDisMax -=20