lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jithin <jithin1...@gmail.com>
Subject Re: Writing a TokenConcatenateFilter - junk characters appearing on output.
Date Sat, 01 Oct 2011 05:27:26 GMT
I meant to say. Now my analser chain looks like this. 

            <analyzer type="index">                                                
                                                                                         
                            
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[-_]" replacement=" " />                                                     
                                          
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />                             
                                          
                <tokenizer class="solr.WhitespaceTokenizerFactory" />              
                                                                                         
                            
                <filter class="solr.LowerCaseFilterFactory" />                     
                                                                                         
                            
                <filter class="solr.StopWordFilterFactory" ignoreCase="true"          
                                                                                         
    
                    words="words.txt" />                                              
                                                                 
                <filter
class="org.ctown.solr.analysis.CTConcatFilterFactory" />                              
                                                                                         
 
            </analyzer>    
            <analyzer type="query">                                                
                                                                                         
                            
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[-_]" replacement=" " />                                                     
                                          
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />                             
                                          
                <tokenizer class="solr.KeywordTokenizerFactory" />                 
                                                                                         
                            
                                                                                         
                                   
            </analyzer>  

But only my first document is getting indexed. Is there any logging I can
enable to see what is going wrong?

--
View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384419.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message