lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marian Steinbach <>
Subject Preventing empty strings in index
Date Mon, 05 Dec 2011 09:01:40 GMT

I am surprised to find an empty string as the most frequent index term in
one of my fields. Until now I didn't even know that empty strings would be

Here is the schema.xml excerpt for that field:

<fieldType name="text_terms" class="solr.TextField">
 <tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9]+$"
replacement="" />
 <filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_terms.txt"
ignoreCase="true" />
 <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_terms.txt" />

<field name="terms" type="text_terms" indexed="true" stored="false"

I have the suspicion that PatternReplaceFilterFactory
with pattern="^[0-9]+$" is causing the empty strings. I introduced that
filter to prevent numbers-only strings from being added to the index.

Any hint on how I can get rid of numbers AND empty strings?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message