lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml
Date Wed, 09 Feb 2011 21:55:57 GMT


Robert Muir commented on SOLR-2338:

...but even if we don't do that, i suppose it's also conceivable that someone might have their
own Similarity implementation that is expensive to instantiate (ie: maintains some big in
memory data structures?) and might want to be able to declare one instance and then refer
to it by name in many different fieldType declarations.

I don't think this is really a use case we need to support: the purpose of Similarity today
is to do term weighting, not to be a huge data-structure holder.

While I know Mike's original patch went this way with LUCENE-2392 (e.g. norms), I'm not sure
i like it being in Similarity in the future either.

Otherwise concepts like lazy-loading norms and all this other stuff get pushed onto the sim,
which is an awkward place (imagine if you have many fields). 

So, I think we shouldn't really design for abuses of the API. If there are other use cases
for "named similarity" that have to do with term weighting, I'm interested.

> improved per-field similarity integration into schema.xml
> ---------------------------------------------------------
>                 Key: SOLR-2338
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.0
>            Reporter: Robert Muir
> Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there
is only a 'global' factory
> for the SimilarityProvider.
> In my opinion this is too low-level because to customize Similarity on a per-field basis,
you have to set your own
> CustomSimilarityProvider with <similarity class=.../> and manage the per-field
mapping yourself in java code.
> Instead I think it would be better if you just specify the Similarity in the FieldType,
like after <analyzer>.
> As far as the example, one idea from LUCENE-1360 was to make a "short_text" or "metadata_text"
used by the
> various metadata fields in the example that has better norm quantization for its shortness...

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message