lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (SOLR-1365) Add configurable Sweetspot Similarity factory
Date Wed, 17 Feb 2010 19:16:28 GMT


Hoss Man commented on SOLR-1365:

The constraints on what can be SolrCoreAware exist for two main reasons:

 # to ensure some sanity in initialization .. one of the main reasons the SolrCoreAware interface
was needed in the first place was because some plugins wanted to use the SolrCore to get access
to other plugins during their initialization -- but those other components weren't necessarily
initialized yet.  with the inform(SolrCore) method SolrCoreAware plugins know that all other
components have been initialized, but they haven't necessarily been informed about the SolrCore,
so they might not be "ready" to deal with other plugins yet ... it's generally just a big
initialization-cluster-fuck, so the fewer classes involved the better
 # prevent too much pollution of the SolrCore API.  having direct access to the SolrCore is
"a big deal" -- once you have a reference to the core, you can get to pretty much anything,
which opens us (ie: Solr maintainers) up to a lot of crazy code paths to worry about -- so
the fewer plugin types that we need to consider when making changes to SolrCore the better.

In the case of SimilarityFactor, i'm not entirely sure how i feel about making it SolrCoreAware(able)
... we have tried really, REALLY hard to make sure nothing initialized as part of the IndexSchema
can be SolrCore aware because it opens up the possibility of plugin behavior being affected
by SolrCore configuration which might be differnet between master and slave machines -- which
could provide disastrous results.  a schema.xml needs to be internally consistent regardless
of what solrconfig.xml might refrence it.

In this case the real issue isn't that we have a use case where SImilarityFactory _needs_
access to SolrCore -- what it wants access to is the IndexSchema, so it might make sense to
just provide access to that in some way w/o having to expos the entire SolrCore.

Practically speaking, after re-skimming the patch: I'm not even convinced that would eally
add anything.  refactoring/reusing some of the *code* that IndexSchema uses to manage dynamicFIelds
might be handy for the SweetSpotSimilarityFactory, but i don't actual see how being able to
inspect the IndexSchema to get the list of dynamicFields (or find out if a field is dynamic)
would make it any better or easier to use.  We'd still want people to configure it with field
names and field name globs directly because there won't necessarily be a one to one correspondence
between what fields are dynamic in the schema and how you want the sweetspots defined ...
you might have a generic "en_*" dynamicField in your schema for english text, and an "fr_*"
dynamicField for french text, but that doesn't mean the sweetspot for all "fr_*" fields will
be the same ... you are just as likely to want some very specific field names to have their
own sweetspot, or to have the sweetspot be suffix based (ie: "*_title" could have one sweetspot
even the resulting field names are fr_title and en_title.

I think the patch could be improved, and i think there is definitely some code reuse possibility
for parsing the field name globs, but i don't know that it really needs run time access to
the IndexSchema (and it definitely doesn't need access to the SolrCore)

> Add configurable Sweetspot Similarity factory
> ---------------------------------------------
>                 Key: SOLR-1365
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.3
>            Reporter: Kevin Osborn
>            Priority: Minor
>             Fix For: 1.5
>         Attachments: SOLR-1365.patch
> This is some code that I wrote a while back.
> Normally, if you use SweetSpotSimilarity, you are going to make it do something useful
by extending SweetSpotSimilarity. So, instead, I made a factory class and an configurable
SweetSpotSimilarty. There are two classes. SweetSpotSimilarityFactory reads the parameters
from schema.xml. It then creates an instance of VariableSweetSpotSimilarity, which is my custom
SweetSpotSimilarity class. In addition to the standard functions, it also handles dynamic
> So, in schema.xml, you could have something like this:
> <similarity class="org.apache.solr.schema.SweetSpotSimilarityFactory">
>     <bool name="useHyperbolicTf">true</bool>
> 	<float name="hyperbolicTfFactorsMin">1.0</float>
> 	<float name="hyperbolicTfFactorsMax">1.5</float>
> 	<float name="hyperbolicTfFactorsBase">1.3</float>
> 	<float name="hyperbolicTfFactorsXOffset">2.0</float>
> 	<int name="lengthNormFactorsMin">1</int>
> 	<int name="lengthNormFactorsMax">1</int>
> 	<float name="lengthNormFactorsSteepness">0.5</float>
> 	<int name="lengthNormFactorsMin_description">2</int>
> 	<int name="lengthNormFactorsMax_description">9</int>
> 	<float name="lengthNormFactorsSteepness_description">0.2</float>
> 	<int name="lengthNormFactorsMin_supplierDescription_*">2</int>
> 	<int name="lengthNormFactorsMax_supplierDescription_*">7</int>
> 	<float name="lengthNormFactorsSteepness_supplierDescription_*">0.4</float>
>  </similarity>
> So, now everything is in a config file instead of having to create your own subclass.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message