lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
Date Wed, 21 Mar 2012 18:53:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234855#comment-13234855
] 

Robert Muir commented on SOLR-2921:
-----------------------------------

Patch looks good: i think you should commit it and I'll follow up with the other ones.

only one nitpick:
{noformat}
-/** 
+/**
  * Factory for {@link TurkishLowerCaseFilter}.
  * <pre class="prettyprint" >
  * &lt;fieldType name="text_trlwr" class="solr.TextField" positionIncrementGap="100"&gt;
- *   &lt;analyzer&gt;
- *     &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
- *     &lt;filter class="solr.TurkishLowerCaseFilterFactory"/&gt;
- *   &lt;/analyzer&gt;
- * &lt;/fieldType&gt;</pre> 
+ * &lt;analyzer&gt;
+ * &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
+ * &lt;filter class="solr.TurkishLowerCaseFilterFactory"/&gt;
+ * &lt;/analyzer&gt;
+ * &lt;/fieldType&gt;</pre>
+ *
{noformat}

Did your IDE do this? I don't think we should lose the indentation of the example there.

                
> Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they
should
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2921
>                 URL: https://issues.apache.org/jira/browse/SOLR-2921
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>         Environment: All
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch
>
>
> SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically
assemble a "multiterm" analyzer that does the right thing vis-a-vis transforming the individual
terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc.
Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent:
>  * ASCIIFoldingFilterFactory
>  * LowerCaseFilterFactory
>  * LowerCaseTokenizerFactory
>  * MappingCharFilterFactory
>  * PersianCharFilterFactory
> When users put any of the above in their query analyzer, Solr will "do the right thing"
at query time and the perennial question users have, "why didn't my wildcard query automatically
lower-case (or accent fold or....) my terms?" will be gone. Die question die!
> But taking a quick look, for instance, at the various FilterFactories that exist, there
are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent.
But I really don't understand the correct behavior here well enough to know whether these
should implement the interface or not. And this doesn't include other CharFilters or Tokenizers.
> Actually implementing the interface is often trivial, see the classes above for examples.
Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case.
> Here is a quick cull of the Filters that, just from their names, might be candidates.
If anyone wants to take any of them on, that would be great. If all you can do is provide
test cases, I could probably do the code part, just let me know.
> ArabicNormalizationFilterFactory
> GreekLowerCaseFilterFactory
> HindiNormalizationFilterFactory
> ICUFoldingFilterFactory
> ICUNormalizer2FilterFactory
> ICUTransformFilterFactory
> IndicNormalizationFilterFactory
> ISOLatin1AccentFilterFactory
> PersianNormalizationFilterFactory
> RussianLowerCaseFilterFactory
> TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message