lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marian Steinbach <marian.steinb...@gmail.com>
Subject Preventing empty strings in index
Date Mon, 05 Dec 2011 09:01:40 GMT
Hi!

I am surprised to find an empty string as the most frequent index term in
one of my fields. Until now I didn't even know that empty strings would be
indexed.

Here is the schema.xml excerpt for that field:

<fieldType name="text_terms" class="solr.TextField">
<analyzer>
 <tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9]+$"
replacement="" />
 <filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_terms.txt"
ignoreCase="true" />
 <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_terms.txt" />
</analyzer>
 </fieldType>

<field name="terms" type="text_terms" indexed="true" stored="false"
multiValued="true"/>


I have the suspicion that PatternReplaceFilterFactory
with pattern="^[0-9]+$" is causing the empty strings. I introduced that
filter to prevent numbers-only strings from being added to the index.

Any hint on how I can get rid of numbers AND empty strings?

Thanks!

Marian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message