lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ExtendedDisMax" by JanHoydahl
Date Mon, 12 Mar 2012 13:56:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ExtendedDisMax" page has been changed by JanHoydahl:
http://wiki.apache.org/solr/ExtendedDisMax?action=diff&rev1=2&rev2=3

Comment:
First real page for eDisMax

- Placeholder page for describing Extended DisMax Qparser. Plan to copy the [[DisMaxQParserPlugin]]
page as a start...
+ <!> [[Solr3.1]] The Extended DisMax Query Parser is a robust parser designed to process
advanced user input directly. It is built on the original DisMaxQParserPlugin but adds many
features. It searches for the query words across multiple fields with different boosts, based
on the significance of each field. Additional options let you influence the score based on
rules specific to each use case (independent of user input).  The DisMax page has more background
on the conceptual origins and behavior.
+ 
+ <<TableOfContents>>
+ 
+ == Overview ==
+ The handler takes responsibility for building a good query from the user's input using !BooleanQueries
containing !DisjunctionMaxQueries across fields and boosts you specify.  It also lets you
provide additional boosting queries, boosting functions, and filtering queries. These options
can all be specified as default parameters for the handler in your solrconfig.xml or overridden
in the Solr query URL. <!> [[Solr3.6]] You can choose which fields the end user is allowed
to query, and choose to disallow direct fielded searches if wanted.
+ 
+ == Query Syntax ==
+ This parser supports full Lucene !QueryParser syntax including boolean operators 'AND',
'OR', 'NOT', '+' and '-', fielded search, term boosting, fuzzy, grouping with parens, phrase
search, phrase slop, numeric ranges, wildcard search and more. If there is a syntax error
in the input, such as non-existing field name or unbalanced double-quotes, the input is gracefully
searched as literal strings.
+ 
+ == Query Structure ==
+ For each "word" in the query string, dismax builds a !DisjunctionMaxQuery object for that
word across all of the fields in the `qf` param (with the appropriate boost values and a tiebreaker
value set from the `tie` param).  These !DisjunctionMaxQuery objects are then put in a !BooleanQuery
with the minNumberShouldMatch option set according to the `mm` param.  If any other params
are specified, a larger !BooleanQuery is wrapped arround the first !BooleanQuery from the
`qf` options, and the other params (`bf`, `bq`, `pf`) are added as optional clauses.  The
only complex clause comes from from the `pf` param, which is a single !DisjuntionMaxQuery
containing the whole query 'phrase' against each of the `pf` fields.
+ 
+ /!\ :TODO: /!\ Need more detail on the query structure generated based on input ... a picture
would be nice.
+ 
+ == Configuration ==
+ Extended !DisMax is already configured in the example schema, with the name *edismax*. Thus,
to select the parser, use {{{defType=edismax}}} in your query, or use the local-param syntax
{!edismax}
+ 
+ == Field aliasing / renaming ==
+ <!> [[Solr3.6]] You may provide virtual alias fields for users to query. This is useful
either to provide a localized or easier name than what happens to be in the schema, or to
provide an alias for a group of fields to support more advanced use cases such as 'what' and
'where' queries, even if there are no physical 'what' or 'where' fields.
+ 
+ The syntax for aliasing is {{{f.myalias.qf=realfield}}}. A user query for {{{myalias:foo}}}
will be queried as {{{realfield:foo}}}.
+ 
+ The alias may also point to multiple fields, with weights. Let's imagine you have a schema
with fields {{{name, namealias, address, city, state}}}, and you want to provide a 'who' and
'where' search. You could then configure aliases like this: {{{&f.who.qf=name^5.0,namealias^2.0&f.where.qf=address^1.0,city^10.0,state}}}.
Any user query for {{{who:foo}}} would expand to a DisMax query across fields name and namealias.
If you further want to hide the real field names, you can combine this with "User Fields"
feature, and say {{{&uf=who,where}}} to only allow fielded search for those two aliases.
+ 
+ 
+ == Parameters ==
+ The following parameters are supported, either as regular request params, or as local params
+ 
+ === q.alt ===
+ If specified, this query will be used (and parsed by default using standard query parsing
syntax) when the main query string is not specified or blank.  This comes in handy when you
need something like a match-all-docs query (don't forget &rows=0 for that one!) in order
to get collection-wise faceting counts.
+ 
+ === qf (Query Fields) ===
+ List of fields and the "boosts" to associate with each of them when building !DisjunctionMaxQueries
from the user's query.  The format supported is {{{fieldOne^2.3 fieldTwo fieldThree^0.4}}},
which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree
has a boost of 0.4 ... this indicates that matches in fieldOne are much more significant than
matches in fieldTwo, which are more significant than matches in fieldThree.
+ 
+ === mm (Minimum 'Should' Match) ===
+ When dealing with queries there are 3 types of "clauses" that Lucene knows about: mandatory,
prohibited, and 'optional' (aka: "SHOULD")  By default all words or phrases specified in the
"q" param are treated as "optional" clauses unless they are preceeded by a "+" or a "-". 
 When dealing with these "optional" clauses, the "mm" option makes it possible to say that
a certain minimum number of those clauses must match (mm).  Specifying this minimum number
can be done in complex ways, equating to ideas like...
+ 
+  * At least 2 of the optional clauses must match, regardless of how many clauses there are:
"{{{2}}}"
+  * At least 75% of the optional clauses must match, rounded down: "{{{75%}}}"
+  * If there are less than 3 optional clauses, they all must match; if there are 3 or more,
then 75% must match, rounded up: "{{{2<-25%}}}"
+  * If there are less than 3 optional clauses, they all must match; for 3 to 5 clauses, one
less than the number of clauses must match, for 6 or more clauses, 80% must match, rounded
down:  "{{{2<-1 5<80%}}}"
+ 
+ Full details on the variety of complex expressions supported are explained in detail [[http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html|here]].
+ 
+ <!> From [[Solr4.0]] The default value of mm is dictated by the q.op param (q.op=AND
=> mm=100%; q.op=OR => mm=0%). Keep in mind the default operator is effected by your
schema.xml <solrQueryParser defaultOperator="xxx"/> entry. In older versions of Solr
the default value of 'mm' is 100% (all clauses must match).
+ 
+ === qs (Query Phrase Slop) ===
+ Amount of slop on phrase queries explicitly included in the user's query string (in qf fields;
affects matching).  <!> [[Solr1.3]]
+ 
+ === pf (Phrase Fields) ===
+ Once the list of matching documents has been identified using the "fq" and "qf" params,
the "pf" param can be used to "boost" the score of documents in cases where all of the terms
in the "q" param appear in close proximity.
+ 
+ The format is the same as the "qf" param: a list of fields and "boosts" to associate with
each of them when making phrase queries out of the entire "q" param.
+ 
+ === ps (Phrase Slop) ===
+ Amount of slop on phrase queries built for "pf" fields (affects boosting).
+ 
+ === pf2 (Phrase bigram fields) ===
+ As with 'pf' but chops the input into bi-grams, e.g. "the brown fox jumped" is queried as
"the brown" "brown fox" "fox jumped"
+ 
+ === ps2 (Phrase bigram slop) ===
+ As with 'ps' but controls the slop factor for 'pf2'
+ 
+ === pf3 (Phrase trigram fields) ===
+ As with 'pf' but chops the input into tri-grams, e.g. "the brown fox jumped" is queried
as "the brown fox" "brown fox jumped"
+ 
+ === ps3 (Phrase trigram slop) ===
+ As with 'ps' but controls the slop factor for 'pf2'
+ 
+ === tie (Tie breaker) ===
+ Float value to use as tiebreaker in !DisjunctionMaxQueries (should be something much less
than 1)
+ 
+ When a term from the users input is tested against multiple fields, more than one field
may match and each field will generate a different score based on how common that word is
in that field (for each document relative to all other documents). By default the score from
the field with the maximum score is used.  If two documents both have a matching score, the
tie parameter has the effect of breaking the tie.
+ When a tie parameter is specified the scores from other matching fields are added to the
score of the maximum scoring field: 
+ 
+ (score of matching clause with the highest score) + ( (tie paramenter) * (scores of any
other matching clauses) )
+ 
+ The "tie" param let's you configure how much the final score of the query will be influenced
by the scores of the lower scoring fields compared to the highest scoring field.
+ 
+ A value of "0.0" makes the query a pure "disjunction max query" -- only the maximum scoring
sub query contributes to the final score.  A value of "1.0" makes the query a pure "disjunction
sum query" where it doesn't matter what the maximum scoring sub query is, the final score
is the sum of the sub scores.  Typically a low value (ie: 0.1) is useful.
+ 
+ === bq (Boost Query) ===
+ A raw query string (in the SolrQuerySyntax) that will be included with the user's query
to influence the score.  If this is a !BooleanQuery with a default boost (1.0f) then the individual
clauses will be added directly to the main query. Otherwise, the query will be included as
is.
+ 
+ /!\ :TODO: /!\  That latter part is deprecated behavior but still works.  It can be problematic
so avoid it.
+ 
+ === bf (Boost Function, additive) ===
+ [[FunctionQuery|Functions]] (with optional boosts) that will be included in the user's query
to influence the score.  Any function supported natively by Solr can be used, along with a
boost value, e.g.: recip(rord(myfield),1,2,3)^1.5
+ 
+ Specifying functions with the "bf" param is just shorthand for using the {{{_val_:"...function..."}}}
syntax in a "bq" param.
+ 
+ For example, if you want to show more recent documents first, use recip(ms(NOW,mydatefield),3.16e-11,1,1).
See FunctionQuery for more functions.
+ 
+ The bf parameter may be specified multiple times.
+ 
+ === boost (Boost Function, multiplicative) ===
+ As for 'bf' but multiplies the boost into the score.
+ 
+ === uf (User Fields) ===
+ Specifies which schema fields the end user shall be allowed to query for explicitly. This
parameter supports wildcards.
+ 
+ The default is to allow all fields, equivalent to {{{&uf=*}}}. To allow only title field,
use {{{&uf=title}}}, to allow title and all fields ending with _s, use {{{&uf=title
*_s}}}. To allow all fields except title, use {{{&uf=* -title}}}
+ 
+ == Examples ==
+ /!\ :TODO: /!\ cleanup and expand examples
+ 
+ Search across multiple fields, specifying (via boosts) how important each field is relative
each other
+ 
+ {{{
+ http://localhost:8983/solr/select/?q=video&defType=edismax&qf=features^20.0+text^0.3
+ }}}
+ You can boost results that have a field that matches a specific value...
+ 
+ {{{
+ http://localhost:8983/solr/select/?q=video&defType=edismax&qf=features^20.0+text^0.3&bq=cat:electronics^5.0
+ }}}
+ Using the "mm" param, 1 and 2 word queries require that all of the optional clauses match,
but for queries with three or more clauses one missing clause is allowed...
+ 
+ {{{
+ http://localhost:8983/solr/select/?q=belkin+ipod&defType=edismax&mm=2
+ http://localhost:8983/solr/select/?q=belkin+ipod+gibberish&defType=edismax&mm=2
+ http://localhost:8983/solr/select/?q=belkin+ipod+apple&defType=edismax&mm=2
+ }}}
+ 
  
  == References ==
  * [[https://issues.apache.org/jira/browse/SOLR-2368|SOLR-2368]] tracks improvements to eDisMax

Mime
View raw message