lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Why I get a hit on %, &, but not on !, @, #, $, ^, *
Date Tue, 14 Jul 2015 03:11:48 GMT
The word delimiter filter is remmoving special characters. You can add a
file containing a list of the special characters that you wish to treat as
alpha, using the "type" parameter.

-- Jack Krupansky

On Mon, Jul 13, 2015 at 6:43 PM, Steven White <swhite4141@gmail.com> wrote:

> Hi Everyone,
>
> I think the subject line said it all.  Here is the schema I'm using:
>
> <fieldType name="my_text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>   <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_en.txt"/>
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1"
> stemEnglishPossessive="1" preserveOriginal="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
> </fieldType>
>
> I'm guessing this is due to how solr.WhitespaceTokenizerFactory works and
> those that it is not indexing are removed because they are considered
> "white-spaces"?  If so, how can I include %, &, etc. into this none-indexed
> list?  I would rather see all these not indexed vs some are and some are
> not causing confusion to my users.
>
> Thanks
>
> Steve
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message