lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Alphanumeric Wild card search
Date Thu, 02 Apr 2015 12:06:20 GMT
This is caused by the word delimiter filter - it breaks multi-part terms
(the hyphens trigger it) into multiple terms. Wildcards simply don't work
consistently well in such a situation. The basic problem is that the
presence of the wildcard causes all but the simplest token filtering stages
to be bypassed, particularly the word delimiter filter (because it would
have stripped out the wildcard asterisk), so your wildcard term is analyzed
differently than it was indexed, so it fails to match. In other cases it
may match, but that would be happen only if the abbreviated token filtering
actually happened to match the full indexing filtering.

This is a limitation of Solr. You just have to learn to live with it. Or...
don't use the word delimiter filter when you need to be able to do
wildcards of multi-part terms.

-- Jack Krupansky

On Thu, Apr 2, 2015 at 3:43 AM, Palagiri, Jayasankar <
Jayashankar.Palagiri@honeywell.com> wrote:

> Hello Team,
>
> Below is my field type
>
> <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> And my field is
>
> <field name="Name" type="text_en_splitting" indexed="true" stored="true" />
>
> I have few docunets in my index
>
> Like 1234-305, 1234-308,1234-318.....
>
> When I search Name:"1234-*" I get desired results, but when I search like
> Name:"123-3*" I get 0 results
>
> Can some one help to find what is wrong with my indexing?
>
> When I search
> Thanks and Regards,
> Jaya
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message