lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: [Solr Wiki] Update of "FunctionQuery" by GrantIngersoll
Date Mon, 16 Nov 2009 01:16:52 GMT
Let's all try to summarize changes to the wiki as we would changes to
the code - without that it's tough to tell what the changes are.

In this particular case, I'm not sure if all of the formatting changes
were deliberate or accidental.  If accidental, I wonder if the cause
was a bug in GUI mode, or a bug in your browser?

-Yonik
http://www.lucidimagination.com



On Sat, Nov 14, 2009 at 9:05 AM, Apache Wiki <wikidiffs@apache.org> wrote:
> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
>
> The "FunctionQuery" page has been changed by GrantIngersoll.
> http://wiki.apache.org/solr/FunctionQuery?action=diff&rev1=29&rev2=30
>
> --------------------------------------------------
>
> - FunctionQuery allows one to use the actual value of a numeric field and functions of
those fields in a relevancy score.
> + FunctionQuery allows one to use the actual value of a numeric field and functions of
those fields in a relevancy score.
>
>  <<TableOfContents>>
>
>  = Using FunctionQuery =
>  There are a few ways to use FunctionQuery from Solr's HTTP interface:
> +
>   1. Embed a FunctionQuery in a regular query expressed in SolrQuerySyntax via the _val_
hook
>   1. Use the FunctionQParserPlugin, ie: {{{q={!func}log(foo)}}}
>   1. Use a parameter that has an explicit type of FunctionQuery, such as DisMaxRequestHandler's
'''bf''' (boost function) parameter.
> -     * NOTE: the '''bf''' parameter actually takes a list of function queries separated
by whitespace and each with an optional boost.  Make sure to eliminate any internal whitespace
in single function queries when using '''bf'''.
> +   * NOTE: the '''bf''' parameter actually takes a list of function queries separated
by whitespace and each with an optional boost.  Make sure to eliminate any internal whitespace
in single function queries when using '''bf'''.
> -     * Example: {{{q=foo&bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"}}}
> +   * Example: {{{q=foo&bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"}}}
>
>  See SolrPlugins#ValueSourceParser for information on how to hook in your own FunctionQuery.
>
> @@ -20, +21 @@
>
>  There is currently no infix parser - functions must be expressed as function calls
(e.g. sum(a,b) instead of a+b)
>
>  = Available Functions =
> -
>  == constant ==
> - <!> [[Solr1.3]]
> - Floating point constants.
> + <!> [[Solr1.3]] Floating point constants.
> +
> -     Example Syntax: '''1.5'''
> +  . Example Syntax: '''1.5'''
> -
> -     SolrQuerySyntax Example: '''_val_:1.5'''
> +  SolrQuerySyntax Example: '''_val_:1.5'''
>
>  == fieldvalue ==
>  This function returns the numeric field value of an indexed field with a maximum of
one value per document (not multiValued).  The syntax is simply the field name by itself.
 0 is returned for documents without a value in the field.
> +
> -     Example Syntax: '''myFloatField'''
> +  . Example Syntax: '''myFloatField'''
> -
> -     SolrQuerySyntax Example: '''_val_:myFloatField'''
> +  SolrQuerySyntax Example: '''_val_:myFloatField'''
>
>  == ord ==
>  ord(myfield) returns the ordinal of the indexed field value within the indexed list
of terms for that field in lucene index order (lexicographically ordered by unicode value),
starting at 1. In other words, for a given field, all values are ordered lexicographically;
this function then returns the offset of a particular value in that ordering. The field must
have a maximum of one value per document (not multiValued).  0 is returned for documents
without a value in the field.
> +
> -    Example: If there were only three values for a particular field: "apple","banana","pear",
then ord("apple")=1, ord("banana")=2, ord("pear")=3
> +  . Example: If there were only three values for a particular field: "apple","banana","pear",
then ord("apple")=1, ord("banana")=2, ord("pear")=3
> -
> -    Example Syntax: '''ord(myIndexedField)'''
> +  Example Syntax: '''ord(myIndexedField)'''
> -
> -    Example SolrQuerySyntax: '''_val_:"ord(myIndexedField)"'''
> +  Example SolrQuerySyntax: '''_val_:"ord(myIndexedField)"'''
>
> + WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must
use a FieldCache entry at the top level reader, while sorting and function queries now use
entries at the segment level.  Hence sorting or using a different function query, in addition
to ord()/rord() will double memory use.
> - WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must
use a FieldCache entry
> - at the top level reader, while sorting and function queries now use entries at the
segment level.  Hence sorting
> - or using a different function query, in addition to ord()/rord() will double memory
use.
> -
>
>  WARNING: ord() depends on the position in an index and can thus change when other documents
are inserted or deleted, or if a !MultiSearcher is used.
>
>  == rord ==
>  The reverse ordering of what ord provides.
> +
> -     Example Syntax: '''rord(myIndexedField)'''
> +  . Example Syntax: '''rord(myIndexedField)'''
> -
> -     Example: '''rord(myDateField)''' is a metric for how old a document is: the youngest
document will return 1, the oldest document will return the total number of documents.
> +  Example: '''rord(myDateField)''' is a metric for how old a document is: the youngest
document will return 1, the oldest document will return the total number of documents.
>
> + WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must
use a FieldCache entry at the top level reader, while sorting and function queries now use
entries at the segment level.  Hence sorting or using a different function query, in addition
to ord()/rord() will double memory use.
> -
> - WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must
use a FieldCache entry
> - at the top level reader, while sorting and function queries now use entries at the
segment level.  Hence sorting
> - or using a different function query, in addition to ord()/rord() will double memory
use.
>
>  == sum ==
> - <!> [[Solr1.3]]
> - sum(x,y,...) returns the sum of multiple functions.
> + <!> [[Solr1.3]] sum(x,y,...) returns the sum of multiple functions.
> +
> -     Example Syntax: '''sum(x,1)'''
> +  . Example Syntax: '''sum(x,1)'''
> -
> -     Example Syntax: '''sum(x,y)'''
> +  Example Syntax: '''sum(x,y)'''
> -
> -     Example Syntax: '''sum(sqrt(x),log(y),z,0.5)'''
> +  Example Syntax: '''sum(sqrt(x),log(y),z,0.5)'''
>
>  == sub ==
> - <!> [[Solr1.4]]
> - sub(x,y) returns x-y
> + <!> [[Solr1.4]] sub(x,y) returns x-y
> +
> -     Example: '''sub(myfield,myfield2)'''
> +  . Example: '''sub(myfield,myfield2)'''
> -
> -     Example: '''sub(100,sqrt(myfield))'''
> +  Example: '''sub(100,sqrt(myfield))'''
>
>  == product ==
> - <!> [[Solr1.3]]
> - product(x,y,...) returns the product of multiple functions.
> + <!> [[Solr1.3]] product(x,y,...) returns the product of multiple functions.
> +
> -     Example Syntax: '''product(x,2)'''
> +  . Example Syntax: '''product(x,2)'''
> -
> -     Example Syntax: '''product(x,y)'''
> +  Example Syntax: '''product(x,y)'''
>
>  == div ==
> - <!> [[Solr1.3]]
> - div(x,y) divides the function x by the function y.
> + <!> [[Solr1.3]] div(x,y) divides the function x by the function y.
> +
> -     Example Syntax: '''div(1,x)'''
> +  . Example Syntax: '''div(1,x)'''
> -
> -     Example Syntax: '''div(sum(x,100),max(y,1))'''
> +  Example Syntax: '''div(sum(x,100),max(y,1))'''
>
>  == pow ==
> - <!> [[Solr1.3]]
> - pow(x,y) raises the base x to the power y.
> + <!> [[Solr1.3]] pow(x,y) raises the base x to the power y.
> +
> -     Example Syntax: '''pow(x,0.5)'''   same as sqrt
> +  . Example Syntax: '''pow(x,0.5)'''   same as sqrt
> -
> -     Example Syntax: '''pow(x,log(y))'''
> +  Example Syntax: '''pow(x,log(y))'''
>
>  == abs ==
> - <!> [[Solr1.3]]
> - abs(x) returns the absolute value of a function.
> + <!> [[Solr1.3]] abs(x) returns the absolute value of a function.
> +
> -     Example Syntax: '''abs(-5)'''
> +  . Example Syntax: '''abs(-5)'''
> -
> -     Example Syntax: '''abs(x)'''
> +  Example Syntax: '''abs(x)'''
>
>  == log ==
> - <!> [[Solr1.3]]
> - log(x) returns log base 10 of the function x.
> + <!> [[Solr1.3]] log(x) returns log base 10 of the function x.
> +
> -     Example Syntax: '''log(x)'''
> +  . Example Syntax: '''log(x)'''
> -
> -     Example Syntax: '''log(sum(x,100))'''
> +  Example Syntax: '''log(sum(x,100))'''
>
>  == sqrt ==
> - <!> [[Solr1.3]]
> - sqrt(x) returns the square root of the function x
> + <!> [[Solr1.3]] sqrt(x) returns the square root of the function x
> +
> -     Example Syntax: '''sqrt(2)'''
> +  . Example Syntax: '''sqrt(2)'''
> -
> -     Example Syntax: '''sqrt(sum(x,100))'''
> +  Example Syntax: '''sqrt(sum(x,100))'''
>
>  == map ==
> - <!> [[Solr1.3]]
> - map(x,min,max,target) maps any values of the function x that fall within min and max
inclusive to target.  min,max,target are constants. It outputs the field's value if it does
not fall between min and max.
> + <!> [[Solr1.3]] map(x,min,max,target) maps any values of the function x that
fall within min and max inclusive to target.  min,max,target are constants. It outputs the
field's value if it does not fall between min and max.
> +
> -     Example Syntax 1: '''map(x,0,0,1)'''  change any values of 0 to 1... useful
in handling default 0 values
> +  . Example Syntax 1: '''map(x,0,0,1)'''  change any values of 0 to 1... useful in
handling default 0 values
> -
> -     Example Syntax 2 <!> [[Solr1.4]]: '''map(x,0,0,1,0)'''  change any values
of 0 to 1 . and if the value is not zero it can be set to the value of the 5th argument instead
of defaulting to the field's value
> +  Example Syntax 2 <!> [[Solr1.4]]: '''map(x,0,0,1,0)'''  change any values
of 0 to 1 . and if the value is not zero it can be set to the value of the 5th argument instead
of defaulting to the field's value
> -
> -
> -
>
>  == scale ==
> - <!> [[Solr1.3]]
> - scale(x,minTarget,maxTarget) scales values of the function x such that they fall between
minTarget and maxTarget inclusive.
> + <!> [[Solr1.3]] scale(x,minTarget,maxTarget) scales values of the function x
such that they fall between minTarget and maxTarget inclusive.
> -     Example Syntax: '''scale(x,1,2)'''  all values will be between 1 and 2 inclusive.
>
> -     NOTE: The current implementation currently traverses all of the function values
to obtain the min and max so it can pick the correct scale.
> +  . Example Syntax: '''scale(x,1,2)'''  all values will be between 1 and 2 inclusive.
NOTE: The current implementation currently traverses all of the function values to obtain
the min and max so it can pick the correct scale.
> -
> -     NOTE: This implementation currently cannot distinguish when documents have been
deleted or documents that have no value, and 0.0 values will be used for these cases.  This
means that if values are normally all greater than 0.0, one can still end up with 0.0 as the
min value to map from.  In these cases, an appropriate map() function could be used as a
workaround to change 0.0 to a value in the real range.  example: '''scale(map(x,0,0,5),1,2)'''
> +  NOTE: This implementation currently cannot distinguish when documents have been deleted
or documents that have no value, and 0.0 values will be used for these cases.  This means
that if values are normally all greater than 0.0, one can still end up with 0.0 as the min
value to map from.  In these cases, an appropriate map() function could be used as a workaround
to change 0.0 to a value in the real range.  example: '''scale(map(x,0,0,5),1,2)'''
>
>  == query ==
> - <!> [[Solr1.4]]
> - query(subquery, default) returns the score for the given subquery, or the default value
for documents not matching the query.  Any type of subquery is supported through either parameter
dereferencing {{{$otherparam}}} or direct specification of the query string in the LocalParams
via "v".
> + <!> [[Solr1.4]] query(subquery, default) returns the score for the given subquery,
or the default value for documents not matching the query.  Any type of subquery is supported
through either parameter dereferencing {{{$otherparam}}} or direct specification of the query
string in the LocalParams via "v".
>
> -     Example Syntax: '''q=product(popularity, query({!dismax v='solr rocks'})''' returns
the product of the popularity and the score of the dismax query.
> +  . Example Syntax: '''q=product(popularity, query({!dismax v='solr rocks'})''' returns
the product of the popularity and the score of the dismax query.
> -
> -     Example Syntax: '''q=product(popularity, query($qq))&qq={!dismax}solr rocks'''
is equivalent to the previous query, using param dereferencing.
> +  Example Syntax: '''q=product(popularity, query($qq))&qq={!dismax}solr rocks'''
is equivalent to the previous query, using param dereferencing.
> -
> -     Example Syntax: '''q=product(popularity, query($qq,0.1))&qq={!dismax}solr
rocks''' specifies a default score of 0.1 for documents that don't match the dismax query.
> +  Example Syntax: '''q=product(popularity, query($qq,0.1))&qq={!dismax}solr rocks'''
specifies a default score of 0.1 for documents that don't match the dismax query.
>
>  == linear ==
>  linear(x,m,c) implements m*x+c where m and c are constants and x is an arbitrary function.
 This is equivalent to '''sum(product(m,x),c)''', but slightly more efficient as it is implemented
as a single function.
> +
> -     Example Syntax: '''linear(x,2,4)'''  returns 2*x+4
> +  . Example Syntax: '''linear(x,2,4)'''  returns 2*x+4
>
>  == recip ==
>  A reciprocal function with '''recip(x,m,a,b)''' implementing a/(m*x+b).  m,a,b are
constants, x is any numeric field or arbitrarily complex function.
>
>  When a and b are equal, and x>=0, this function has a maximum value of 1 that drops
as x increases. Increasing the value of a and b together results in a movement of the entire
function to a flatter part of the curve. These properties can make this an ideal function
for boosting more recent documents when x is rord(datefield).
> +
> -     Example Syntax: '''recip(rord(creationDate),1,1000,1000)'''
> +  . Example Syntax: '''recip(rord(creationDate),1,1000,1000)'''
>
> - <!> [[Solr1.4]]
> - In Solr 1.4 and later, best practice is to avoid ord() and rord() and derive the boost
directly from the value of the date field.
> + <!> [[Solr1.4]] In Solr 1.4 and later, best practice is to avoid ord() and rord()
and derive the boost directly from the value of the date field. See ms() for more details.
> - See ms() for more details.
>
>  == max ==
>  max(x,c) returns the max of another function and a constant.  Useful for "bottoming
out" another function at some constant.
> +
> -     Example Syntax: '''max(myfield,0)'''
> +  . Example Syntax: '''max(myfield,0)'''
>
>  == ms ==
>  <!> [[Solr1.4]]
> @@ -175, +149 @@
>
>  Arguments may be numerically indexed date fields such as !TrieDate (the default in
1.4), or date math (examples in SolrQuerySyntax) based on a constant date or '''NOW'''.
>
>  '''ms()'''
> +
> -   Equivalent to '''ms(NOW)''', number of milliseconds since the epoch.
> +  . Equivalent to '''ms(NOW)''', number of milliseconds since the epoch.
> +
>  '''ms(a)'''
> +
> -   Returns the number of milliseconds since the epoch that the argument represents.
> +  . Returns the number of milliseconds since the epoch that the argument represents.
> -
> -   Example: '''ms(NOW/DAY)'''
> +  Example: '''ms(NOW/DAY)'''
> -
> -   Example: '''ms(2000-01-01T00:00:00Z)'''
> +  Example: '''ms(2000-01-01T00:00:00Z)'''
> -
> -   Example: '''ms(mydatefield)'''
> +  Example: '''ms(mydatefield)'''
> +
>  '''ms(a,b)'''
> +
> -   Returns the number of milliseconds that {{{b}}} occurs before {{{a}}} (i.e. {{{a
- b}}}).  Note that this offers higher precision than '''sub(a,b)''' because the arguments
are not converted to floating point numbers before subtraction.
> +  . Returns the number of milliseconds that {{{b}}} occurs before {{{a}}} (i.e. {{{a
- b}}}).  Note that this offers higher precision than '''sub(a,b)''' because the arguments
are not converted to floating point numbers before subtraction.
> -
> -   Example: '''ms(NOW,mydatefield)'''
> +  Example: '''ms(NOW,mydatefield)'''
> -
> -   Example: '''ms(mydatefield,2000-01-01T00:00:00Z)'''
> +  Example: '''ms(mydatefield,2000-01-01T00:00:00Z)'''
> -
> -   Example: '''ms(datefield1,datefield2)'''
> +  Example: '''ms(datefield1,datefield2)'''
> +
> + == dist ==
> + [[Solr1.5]]
> +
> + Return the Distance between two Vectors (points) in an n-dimensional space.  See http://en.wikipedia.org/wiki/Lp_space
for more information.  Takes in the power, plus two or more !ValueSource instances and calculates
the distances between the two vectors.  Each !ValueSource must be a number.
> +
> + Common cases:
> +
> +  ||<tablestyle="width: 467px; height: 88px;">Power ||Common Name ||
> +  ||0 ||Sparseness calculation ||
> +  ||1||Manhattan (taxicab) Distance||
> +  ||2||Euclidean Distance||
> +  ||Infinite||Infinite norm - maximum value in the vector||
> +
> +
>
>  === Date Boosting ===
>  Boosting more recent content is a common use case.  One way is to use a {{{recip}}}
function in conjunction with {{{ms}}}.
> @@ -203, +191 @@
>
>  Also see http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>
>  == top ==
> - <!> [[Solr1.4]]
> - Causes it's function query argument to derive it's values from the top-level IndexReader
containing all parts of an index.  For example, the ordinal of a value in a single segment
will be different from the ordinal of that same value in the complete index.  The ord() and
rord() functions implicitly use top() and hence ord(foo) is equivalent to top(ord(foo)).
> + <!> [[Solr1.4]] Causes it's function query argument to derive it's values from
the top-level IndexReader containing all parts of an index.  For example, the ordinal of
a value in a single segment will be different from the ordinal of that same value in the complete
index.  The ord() and rord() functions implicitly use top() and hence ord(foo) is equivalent
to top(ord(foo)).
>
>  = General Example =
> -
> - To give more idea about the use of the function query, suppose index stores dimensions
in meters '''x''', '''y''','''z''' of some hypothetical boxes with arbitrary names stored
in field '''boxname'''.
> + To give more idea about the use of the function query, suppose index stores dimensions
in meters '''x''', '''y''','''z''' of some hypothetical boxes with arbitrary names stored
in field '''boxname'''. Suppose we want to search for box matching name ''findbox'' but ranked
according to volumes of boxes, the query params would be:
> - Suppose we want to search for box matching name ''findbox'' but ranked according to
volumes of boxes, the query params would be:
> +
>  {{{
>    q=boxname:findbox+_val_:"product(product(x,y),z)"
>  }}}
> -
>  Although this will rank the results based on volumes but in order to get the computed
volume you will need to add parameter...
> +
>  {{{
>    &fl=*,score
>  }}}
> -
>  ...where '''score''' will contain the resultant volume.
>
>  Suppose you also have a field containing weight of the box as 'weight', then to sort
by the density of the box and return the value of the density in score you query should be...
> @@ -226, +211 @@
>
>  {{{
>  http://localhost:8983/solr/select/?q=boxname:findbox+_val_:"div(weight,product(product(x,y),z))"&fl=boxname,x,y,z,weight,score
>  }}}
> -
>
>

Mime
View raw message