lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2338) improved per-field similarity integration into schema.xml
Date Thu, 24 Mar 2011 04:43:05 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-2338:
------------------------------

    Attachment: SOLR-2338.patch

Here's a first stab: I included LUCENE-2986's cleanup work for easy testing (this issue depends
upon it).

Here is the syntax:
{noformat}
  <!--  specify a Similarity classname directly -->
  <fieldType name="sim1" class="solr.TextField">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    <similarity class="org.apache.lucene.misc.SweetSpotSimilarity"/>
  </fieldType>

  <!--  specify a Similarity factory -->  
  <fieldType name="sim2" class="solr.TextField">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
    <similarity class="org.apache.solr.schema.CustomSimilarityFactory">
      <str name="echo">is there an echo?</str>
    </similarity>
  </fieldType>
{noformat}

Additionally, its necessary to allow customization of the SimilarityProvider too, in order
to customize the non-field specific stuff like coord()... this is done via:
{noformat}
 <!-- expert: SimilarityProvider contains scoring routines that are not field-specific,
      such as coord() and queryNorm(). most scoring customization happens in the fieldtype.
      A custom similarity provider may be specified here, but the default is fine
      for most applications.
 -->
 <similarityProvider class="org.apache.solr.schema.CustomSimilarityProviderFactory">
   <str name="echo">is there an echo?</str>
 </similarityProvider>
{noformat}


> improved per-field similarity integration into schema.xml
> ---------------------------------------------------------
>
>                 Key: SOLR-2338
>                 URL: https://issues.apache.org/jira/browse/SOLR-2338
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>         Attachments: SOLR-2338.patch
>
>
> Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there
is only a 'global' factory
> for the SimilarityProvider.
> In my opinion this is too low-level because to customize Similarity on a per-field basis,
you have to set your own
> CustomSimilarityProvider with <similarity class=.../> and manage the per-field
mapping yourself in java code.
> Instead I think it would be better if you just specify the Similarity in the FieldType,
like after <analyzer>.
> As far as the example, one idea from LUCENE-1360 was to make a "short_text" or "metadata_text"
used by the
> various metadata fields in the example that has better norm quantization for its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message