lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Rafalovitch (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (SOLR-10145) Alphanumeric text getting indexed separately for WhitespaceTokenizerFactory
Date Thu, 16 Feb 2017 15:39:41 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexandre Rafalovitch closed SOLR-10145.
----------------------------------------
    Resolution: Information Provided

> Alphanumeric text getting indexed separately for WhitespaceTokenizerFactory 
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-10145
>                 URL: https://issues.apache.org/jira/browse/SOLR-10145
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Lakshmi Kanta Nandi
>
> Alphanumeric text is getting indexed separately for the WhitespaceTokenizerFactory. I
have tried tokenizer class solr.KeywordTokenizerFactory too but still my text getting splitted
and indexed.
> Scenario
> Input string: ABCD1234EFGH
> Generated index: ABCD, 1234, EFGH
> Expected index: ABCD1234EFGH
> Search
> Input: ABC* returns success 
> Input: ABCD123* returns fail (success expected)
> Inout: ABCD1234 returns success
> Configuration
> {code}
>     <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>
>         -->
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0"
catenateWords="1" catenateNumbers="1" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
> {code}
> Solr version: 4.3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message