lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel <isaacr...@gmail.com>
Subject Re: How to index correctly a text save with tinyMCE
Date Thu, 23 Jun 2011 16:34:06 GMT
I'am sorry I bother you again but this doesn't work, I have written
this configuration in my schema.xml file:

<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Spanish"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

But it still doesn't convert the code to the correct character, for
instance: Espa&amp;ntilde;a must be converted to EspaƱa but it still
remains as Espa&amp;ntilde;a.
I have included in this email an atachment with the results of the
analysis.jsp application.

Any help would be really appreciate it.
Regards,
Ariel

On 6/16/11, Steven A Rowe <sarowe@syr.edu> wrote:
> Hi Ariel,
>
> As Shawn says, char filters come before tokenizers.
>
> You need to use a <charFilter> tag instead of <filter> tag.
>
> I've updated the HTMLStripCharFilter documentation on the Solr wiki to
> include this information:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
>
> Steve
>
>> -----Original Message-----
>> From: Shawn Heisey [mailto:solr@elyograg.org]
>> Sent: Thursday, June 16, 2011 1:32 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to index correctly a text save with tinyMCE
>>
>> On 6/16/2011 11:12 AM, Ariel wrote:
>> > Thanks for your answer, I have just put the filter in my schema.xml but
>> it
>> > doesn't work I am using solr 1.4 and my conf is:
>> >
>> > <code>
>> > <analyzer type="index">
>> >      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >      <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt"/>
>> >      <filter class="solr.LowerCaseFilterFactory"/>
>> >      <filter class="solr.HTMLStripCharFilterFactory"/>
>> >      <filter class="solr.SnowballPorterFilterFactory"
>> language="Spanish"/>
>> >      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >   </analyzer>
>> > </code>
>> >
>> >
>> > But it doesn't work in tomcat 6 logs I get this error:
>> >
>> >   java.lang.ClassCastException:
>> > org.apache.solr.analysis.HTMLStripCharFilterFactory cannot be cast to
>> > org.apache.solr.analysis.TokenFilterFactory
>>
>> According to the wiki, the output of that filter must be passed to
>> either another CharFilter or a Tokenizer.  Try moving it before
>> WhitespaceTokenizerFactory.
>>
>> Shawn
>
>

Mime
View raw message