lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian M <>
Subject default RegexFragmenter
Date Tue, 11 Jan 2011 16:22:01 GMT


I'm investigating an issue where spellcheck queries are tokenized without
being explicitly told to do so, resulting in suggestions such as
"" for the queries such as

The default RegexFragmenter fragmenter (name="regex") uses the regular

[-\w ,/\n\"']{20,200}

I understand parts of it, but I'm not sure about the - sign, or the slash
midway through it.
I would like to perhaps tailor this regular expression to not cause query
terms such as "" to be broken down on the period marks,
but just be kept as they are.

Any suggestions or answers are highly appreciated!

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message