lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Osullivan L. <L.Osulli...@swansea.ac.uk>
Subject RE: charFilter
Date Thu, 13 Sep 2012 15:53:41 GMT
Hi Folks,

Thanks to Robert and Uwe - I have a solution.

In the end, Robert's suggestion was the easiest to implement so I went with that. Uwe's advice
has given me the starting point for my next task and convinced me I need to find a Java 101
course.

Kind Regards,

Luke

-----Original Message-----
From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: 13 September 2012 16:16
To: general@lucene.apache.org
Subject: Re: charFilter

On Thu, Sep 13, 2012 at 6:43 AM, Osullivan L. <L.Osullivan@swansea.ac.uk> wrote:
>
> In my schema I have:
>
>     <fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" omitNorms="true">
>         <analyzer>
>           <charFilter class="com.test.solr.analysis.LukesTestCharFilterFactory"/>
>           <tokenizer class="solr.KeywordTokenizerFactory"/>
>         </analyzer>
>     </fieldType>
>

The main use of a CharFilter is to alter the text before the tokenizer even runs at all: you
can use this to do things like adjust the tokenizer's behavior.

So in your example, since it just has KeywordTokenizer, I don't think CharFilter is the easiest
way to do what you want.
I think you should instead just use a TokenFilter that does your transformation, putting it
after KeywordTokenizer.

This should be significantly easier to write as you don't need to deal with offset corrections
or any of that, just change the term text.

--
lucidworks.com
Mime
View raw message