lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "FieldAliasesAndGlobsInParams" by FrankWesemann
Date Fri, 26 Feb 2010 04:40:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "FieldAliasesAndGlobsInParams" page has been changed by FrankWesemann.
http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams?action=diff&rev1=6&rev2=7

--------------------------------------------------

- This was prompted by some ideas put forth  in [[https://issues.apache.org/jira/browse/SOLR-247|SOLR-247]]
and in the mailing list threads linked to from that issue.  See also: [[https://issues.apache.org/jira/browse/SOLR-456|SOLR-456]]

+ This was prompted by some ideas put forth  in [[https://issues.apache.org/jira/browse/SOLR-247|SOLR-247]]
and in the mailing list threads linked to from that issue.  See also: [[https://issues.apache.org/jira/browse/SOLR-456|SOLR-456]]
  
  For now this is a brainstorming page, if/when any of this gets implemented it can be reworked
into a documentation page for users.
  
  = Background =
- 
  Currently the [[CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab|fl]]
param supports two "special" field names: "*" which means "any stored field", and "score"
which not only means "include the score in the response", but also informs the request handler
that scores should be computed.  the fl param is split on the regex Pattern ",| ".
  
  The splitting happens in !SolrPluginUtils.setReturnFields, which parses (one and only one)
string "fl" param, and sets the field list on the !SolrQueryResponse, as well as returning
info about whether or not the list contained "score" so the handler has that info to work
with.
  
  Small problems with this (that most people have never cared about)...
+ 
-    * it makes it hard to use field names with spaces (or "|" or ",") ...no other code in
Solr cares what chars are in field names.
+  * it makes it hard to use field names with spaces (or "|" or ",") ...no other code in Solr
cares what chars are in field names.
-    * you can't have a field named "score"
+  * you can't have a field named "score"
  
  Some people expressed a desire to have "*" work for the facet.field param as well ... see
SOLR-247 for reasons why this is probably a bad idea, but having more generic glob syntax
support (in both the "fl" and "facet.field" params) would be handy.
  
  = related issues =
- 
-    * "fl" can't be used as a multi-value param
+  * "fl" can't be used as a multi-value param
-    * no way to prevent certain users from getting certain fields
+  * no way to prevent certain users from getting certain fields
-    * no way to prevent faceting on certain fields
+  * no way to prevent faceting on certain fields
-    * in most cases, searching and sorting on the same logical field requires clients to
know two differenet field names (ie: "q=name:foo&sort=name_sortable+asc")
+  * in most cases, searching and sorting on the same logical field requires clients to know
two differenet field names (ie: "q=name:foo&sort=name_sortable+asc")
  
  = Broad Idea =
- 
  Add robust support for letting solr admins configure what special syntax or aliases can
be used *at query type* to refer to fields based on context (sorting, returned fields, search
fields, facet fields, etc...)
  
  new syntax in solrconfig.xml -- most of which should be on a per handler basis (probably
via a new Component) -- that let's Solr administrators say things like:
  
-    * "this is the regex pattern to be used when processing fieldname realated params"
+  * "this is the regex pattern to be used when processing fieldname realated params"
-       * "fl" becomes a multivalued field
+   * "fl" becomes a multivalued field
-       * default: "|, "
+   * default: "|, "
-       * if not specificed, then fieldname params like fl and facet.field are taken literally
(what do do about"sort" ?)
+   * if not specificed, then fieldname params like fl and facet.field are taken literally
(what do do about"sort" ?)
-    * "for this param, alias this string to this real field"
+  * "for this param, alias this string to this real field"
-       * ie: "sort=name+asc" ultimately sorts on "name_sortable"
+   * ie: "sort=name+asc" ultimately sorts on "name_sortable"
-    * "for this param, alias this string to the documents score"
+  * "for this param, alias this string to the documents score"
-       * defaults to "score" for "fl" and "sort"
+   * defaults to "score" for "fl" and "sort"
-       * ie: "fl=name,importance" ... importance might be the score fields
+   * ie: "fl=name,importance" ... importance might be the score fields
-    * "for this param, take any string that looks like a fieldname and append/prepend this
string to it"
+  * "for this param, take any string that looks like a fieldname and append/prepend this
string to it"
-       * ie: any field name specified in the sort param can have "_sort" appended to it.
+   * ie: any field name specified in the sort param can have "_sort" appended to it.
-    * "for this param, alias this string to a regex or glob"
+  * "for this param, alias this string to a regex or glob"
-       * ie: "fl=stockFields&fl=priceFields&facet.field=catFields" might mean return
all fields matching two configured regexes and facet on all fields related to categorization.
+   * ie: "fl=stockFields&fl=priceFields&facet.field=catFields" might mean return
all fields matching two configured regexes and facet on all fields related to categorization.
-    * "allow users to specify globs for this param" or "allow users to specify regexes for
this param"
+  * "allow users to specify globs for this param" or "allow users to specify regexes for
this param"
-       * ie: if globing is turned on for facet.field, then "facet.field=facet_*" is legal
+   * ie: if globing is turned on for facet.field, then "facet.field=facet_*" is legal
-       * ie: if regexes are turned on for "fl" then, then "fl=name&fl=.*text" is legal
+   * ie: if regexes are turned on for "fl" then, then "fl=name&fl=.*text" is legal
-    * "fields (not)-matching this glob or regex pattern are to be treated as if they didn't
exist when using dealing with this param"
+  * "fields (not)-matching this glob or regex pattern are to be treated as if they didn't
exist when using dealing with this param"
-       * allows fields to be hidden in various contexts, even if the user guesses/knows they
exist
+   * allows fields to be hidden in various contexts, even if the user guesses/knows they
exist
-       * ie: only allow the "sort" param to contain something matching the glob "*_sort"
+   * ie: only allow the "sort" param to contain something matching the glob "*_sort"
-       * ie: only return fields matching the regex "name|.*price.*|(short|long)summary" ...
even if the users uses a glob "fl" param (return the intersection of fields matching the regex
and the glob)
+   * ie: only return fields matching the regex "name|.*price.*|(short|long)summary" ... even
if the users uses a glob "fl" param (return the intersection of fields matching the regex
and the glob)
-    * "for this param, ignore field names that aren't recognized or allowed by the configured
rules"
+  * "for this param, ignore field names that aren't recognized or allowed by the configured
rules"
-    * "for this param, error if a field name isn't recognized or allowed by the configured
rules"
+  * "for this param, error if a field name isn't recognized or allowed by the configured
rules"
  
  ...all of these things should be combinable in an order specified by the solr admin, they
can say things like "when dealing with the facet.field param, let users specify regexes to
identify the fields to facet on, and map the string "price" to the field "price_dollars_facet"
but ultimately ignore any field that doesn't match the glob "*_facet"
  
  = Implementation =
- 
  The best way to do this may be to have a Component which can be configured with all of these
rules (and reused by multiple handlers).  The component would parse the input params, error
if neccessary, and construct an object put into the request context that subsequent Components
can call methods on to get field name Sets (or iterators) based on the param name being processed,
the schema, the rules defined, the context of operation (ie: dealing with stored fields, dealing
with indexed indexed fields, a specific document for returned fields, etc...)
  
- It should not be too difficult if one uses the "new" queryParser mechanism from Lucene contrib.
The Processor/Builder chain is suited for these changes.
+ It should not be too difficult if one uses the "new" queryParser mechanism from Lucene contrib.
The Processor/Builder chain is suited for these changes. All this aliasing can be configured
by configfile or on a per request basis.
  

Mime
View raw message