lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "david.w.smiley@gmail.com" <david.w.smi...@gmail.com>
Subject Quiz question: Which Character.isSpaceChar but not isWhitespace?
Date Fri, 30 Oct 2015 20:10:53 GMT
One would think that all “space characters” are by definition
“whitespace”.  Not true!:
http://www.fileformat.info/info/unicode/char/00a0/index.htm

So I’m working on an app where I can no longer use WhitespaceTokenizer
since I need to check for isSpacheChar *OR* isWhitespace.  Alternatively I
could use MappingCharFilter, I realize.

This had trickle-down effects on a search platform I’m working on that was
triggered by a user’s search.  It’s caused all sorts of head-scratching
till we discovered what’s going on.

Craziness.

~ David
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message