lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Lowercase all characters in String
Date Tue, 11 Oct 2016 14:24:10 GMT
Hi,

Would like to find out, what is the best way to lowercase all the text,
while preserving all the tokens.

As I need to preserve every character of the text (including symbols and
white space), I'm using String. However, I can't put the
LowerCaseFilterFactory in String.

I found that we can use WhitespaceTokenizerFactory, followed by
LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
the symbols, it will still split on Whitespace, which is what we do not
want. This is because we may have words like 'One' and 'One Way'. If we use
the WhitespaceTokenizerFactory and search for 'One', it will return records
with 'One Way' too, which is what we do not want.

Is there other way which we can achieve this?

I'm using Solr 6.2.1.

Regards,
Edwin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message