lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cyang2010 <ysxsu...@hotmail.com>
Subject stopFilterFactor and SnowballPorterFilterFactory not work for Spanish
Date Tue, 15 Mar 2011 23:07:06 GMT
I am using solr 1.4.1.   I am trying to index a spanish field using the
following tokenizer/filters:

    
      
        
        
        
        
	

Using field analysis solr Admin i can tell StopFilterFactory and
SnowballPorterFilterFactory with Spanish not working right:

1. after stopFilter, "la" should be gone, but it is not.
2. after snowballporterFilterFactory(language=Spanish), "cöcktäils" should
become "cöcktäil".  But i still see the token "cöcktäils" coming out.

I configured a spanish stopword list for the StopFilterFactory.

Field name: title_name
field value:  la Cöcktäils


Index Analyzer
=========================================================================
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position 	1	2
term text 	la	Cöcktäils
term type 	word	word
source start,end 	0,2	3,12
payload 		

=============================================================================
org.apache.solr.analysis.StopFilterFactory {words=stopwords_es.txt,
ignoreCase=true}
term position 	1	2
term text 	la	Cöcktäils
term type 	word	word
source start,end 	0,2	3,12
payload 
==============================================================================	
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position 	1	2
term text 	la	cöcktäils
term type 	word	word
source start,end 	0,2	3,12
payload
=============================================================================== 		
org.apache.solr.analysis.SnowballPorterFilterFactory {language=Spanish}
term position 	1	2
term text 	la	cöcktäils
term type 	word	word
source start,end 	0,2	3,12
payload 		
===============================================================================
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position 	1	2
term text 	la	cöcktäils
term type 	word	word
source start,end 	0,2	3,12
payload 		
==============================================================================



I just copied the text from this URL to form my stopwords_es.txt:

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt



Look forward to your help...

--
View this message in context: http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684322.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message