lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Sturge (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
Date Wed, 23 Mar 2011 18:11:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010268#comment-13010268
] 

Peter Sturge commented on SOLR-2438:
------------------------------------

If you're like me, you may have often wondered why MyTerm, myterm, myter* and MyTer* can return
different, and sometimes empty results.
This patch addresses this for wildcard queries by adding an attribute to relevant solr.TextField
entries in schema.xml.
The new attribute is called:  {{ignoreCaseForWildcards}}

Example entry in schema.xml:
{code:title=schema.xml [excerpt]|borderStyle=solid}
<fieldType name="text_lcws" class="solr.TextField" positionIncrementGap="100" ignoreCaseForWildcards="true">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
  </analyzer>
</fieldType>
{code}

It's worth noting that this will lower-case text for ALL terms that match the field type -
including synonyms and stemmers.

For backward compatibility, the default behaviour is as before - i.e. a case sensitive wildcard
search ({{ignoreCaseForWildcards=false}}).

The patch was created against the lucene_solr_3_1 branch. I've not applied it yet on trunk.

[caveat emptor] I freely admit I'm no schema expert, so commiters and community members may
see use cases where this approach could pose problems. I'm all for feedback to enhance the
functionality...

The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches in Solr -
in line with the 'it just works' Solr philosophy.

Enjoy!


> Case Insensitive Search for Wildcard Queries
> --------------------------------------------
>
>                 Key: SOLR-2438
>                 URL: https://issues.apache.org/jira/browse/SOLR-2438
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Peter Sturge
>         Attachments: SOLR-2438.patch
>
>
> This patch adds support to allow case-insensitive queries on wildcard searches for configured
TextField field types.
> This patch extends the excellent work done Yonik and Michael in SOLR-219.
> The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message