lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe <tomasflo...@gmail.com>
Subject Re: Preventing empty strings in index
Date Mon, 05 Dec 2011 12:28:27 GMT
You could try adding a
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory

Regards,

Tomás

On Mon, Dec 5, 2011 at 6:01 AM, Marian Steinbach <marian.steinbach@gmail.com
> wrote:

> Hi!
>
> I am surprised to find an empty string as the most frequent index term in
> one of my fields. Until now I didn't even know that empty strings would be
> indexed.
>
> Here is the schema.xml excerpt for that field:
>
> <fieldType name="text_terms" class="solr.TextField">
> <analyzer>
>  <tokenizer class="solr.StandardTokenizerFactory" />
> <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9]+$"
> replacement="" />
>  <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_terms.txt"
> ignoreCase="true" />
>  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_terms.txt" />
> </analyzer>
>  </fieldType>
>
> <field name="terms" type="text_terms" indexed="true" stored="false"
> multiValued="true"/>
>
>
> I have the suspicion that PatternReplaceFilterFactory
> with pattern="^[0-9]+$" is causing the empty strings. I introduced that
> filter to prevent numbers-only strings from being added to the index.
>
> Any hint on how I can get rid of numbers AND empty strings?
>
> Thanks!
>
> Marian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message