lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <erik.hatc...@gmail.com>
Subject Re: CharFilter, analysis.jsp
Date Tue, 18 Aug 2009 03:03:27 GMT
That fixes it with analysis.jsp, but not with  
FieldAnalysisRequestHandler I don't think.  Using that field  
definition below, and this request -

http://localhost:8983/solr/analysis/field?analysis.fieldtype=html_text&analysis.fieldvalue=%3Chtml%3E%3Cbody%3Ewhatever%3C/body%3E%3C/html%3E

I still see <str name="text"><html><body>whatever</body></html></str>
 
come out of WhitespaceTokenizer.

Does the consumer of an Analyzer from a FieldType have to do anything  
special to utilize CharFilter's?  Or it should all "just work"?

	Erik


On Aug 17, 2009, at 10:52 PM, Yonik Seeley wrote:

> I broke it with reusable token streams.  Just checked in a fix - can
> you try now?
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Mon, Aug 17, 2009 at 10:17 PM, Erik  
> Hatcher<erik.hatcher@gmail.com> wrote:
>> I'm interested in using a CharFilter, something like this:
>>
>>    <fieldType name="html_text" class="solr.TextField">
>>      <analyzer>
>>        <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>      </analyzer>
>>    </fieldType>
>>
>> In hopes of being able to put in a value like
>> "<html><body>whatever</body></html>" and have "whatever"
come back  
>> out.  In
>> analysis.jsp, I see that happening in the verbose output but it  
>> doesn't make
>> it to the tokenizer input - the original string makes it there.
>>
>> I must be misunderstanding something about CharFilter's and how to  
>> use them
>> in Solr.  HTMLStripWhitespaceTokenizerFactory is deprecated in  
>> favor of the
>> above design, I think, but does what I'm after.
>>
>> Solr only seems to use CharFilter's in analysis.jsp.  Is that  
>> correct?
>>  Shouldn't they be factored into the analyzer for each field?   
>> (like in
>> FieldAnalysisRequestHandler)
>>
>> Thanks,
>>        Erik
>>
>>


Mime
View raw message