lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject underscore a word separator in StandardAnalyzer?
Date Sat, 14 Mar 2009 22:36:28 GMT

Hello fellows of Lucene,

I just discovered that the _ character is a word separator in the  
StandardAnalyzer.
Can it be?
It broke our usage of a field that stores a comma-separated list of  
"uri-fragments" which, of course, contain _: the standard-analyzer  
splits these as separate term which fully-fuzzifies the search.

Is there any rationale? A past debate about that?
I would feel my candid approach to be rather common: underscore makes  
new words out of existing words, dash makes composed words.

I sure know I can try to adapt standard-analyzer! I wanted to know the  
reasons.

paul
Mime
View raw message