lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject CharFilter, analysis.jsp
Date Tue, 18 Aug 2009 02:17:51 GMT
I'm interested in using a CharFilter, something like this:

     <fieldType name="html_text" class="solr.TextField">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>

In hopes of being able to put in a value like "<html><body>whatever</ 
body></html>" and have "whatever" come back out.  In analysis.jsp, I  
see that happening in the verbose output but it doesn't make it to the  
tokenizer input - the original string makes it there.

I must be misunderstanding something about CharFilter's and how to use  
them in Solr.  HTMLStripWhitespaceTokenizerFactory is deprecated in  
favor of the above design, I think, but does what I'm after.

Solr only seems to use CharFilter's in analysis.jsp.  Is that  
correct?  Shouldn't they be factored into the analyzer for each  
field?  (like in FieldAnalysisRequestHandler)


View raw message