lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Straßer (JIRA) <j...@apache.org>
Subject [jira] [Updated] (SOLR-4873) star-wildcard (*) does not work together with stemming
Date Wed, 29 May 2013 13:27:19 GMT

     [ https://issues.apache.org/jira/browse/SOLR-4873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christoph Straßer updated SOLR-4873:
------------------------------------

    Attachment: PostContent2Solr.java

Java-code using SolrJ for indexing sample-files.
                
> star-wildcard (*) does not work together with stemming
> ------------------------------------------------------
>
>                 Key: SOLR-4873
>                 URL: https://issues.apache.org/jira/browse/SOLR-4873
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 4.2
>         Environment: Windows 7, Java 7
>            Reporter: Christoph Straßer
>         Attachments: PostContent2Solr.java, tochter1.htm, tochter2.htm, tochter3.htm,
tochter4.htm
>
>
> Without using a stemming-filter (e.g. solr.SnowballPorterFilterFactory)
> http://localhost:8983/solr/collection1/select?q=Tochter*
> matches "Tochter", "Tochterunternehmen" or "Töchter".
> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
> 	<analyzer type="index">
> 		<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> 		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
> 		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> 		<filter class="solr.LowerCaseFilterFactory"/>
> 	</analyzer>
> With using a stemming-filter the same query
> http://localhost:8983/solr/collection1/select?q=Tochter*
> only matches "Tochterunternehmen" but not "Tochter" or "Töchter". (Stemming is applied
for type="index" and type="query")
> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
> 	<analyzer type="index">
> 		<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> 		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
> 		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
> 		<filter class="solr.LowerCaseFilterFactory"/>
> 		<filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"
/>
> 	</analyzer>
> 	
> Sample-Files attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message