lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Kan <solrexp...@gmail.com>
Subject Re: [solr 4.7.0] analysis page: issue with HTMLStripCharFilterFactory
Date Sat, 15 Mar 2014 19:25:46 GMT
Hi Doug,

I have tried the patch and seems to have corrected the issue. Thanks for
pointing to the jira.

Dmitry


On Sat, Mar 15, 2014 at 8:05 PM, Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> The char filter is not broken. There's a bug in 4.7 in the analysis UI:
>
> https://issues.apache.org/jira/browse/SOLR-5800
>
> It was unclear to me if it would be part of a 4.7.1 release. I hope so,
> as it'll probably save people a lot of time from thinking their
> analyzers are broken.
>
>
> Sent from my Windows Phone From: Dmitry Kan
> Sent: 3/15/2014 1:58 PM
> To: solr-user@lucene.apache.org
> Subject: [solr 4.7.0] analysis page: issue with
> HTMLStripCharFilterFactory
> Hello,
>
> The following type does not get analyzed properly on the solr 4.7.0
> analysis page:
>
>     <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>     <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <!-- <tokenizer class="solr.WhitespaceTokenizerFactory"/> -->
> <tokenizer class="solr.StandardTokenizerFactory" />
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> Example text:
> fox jumps
>
> Screenshot:
> http://pbrd.co/1lEVEIa
>
> This works fine in solr 4.6.1.
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message