lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian M <mihais...@yahoo.com>
Subject default RegexFragmenter
Date Tue, 11 Jan 2011 16:22:01 GMT

Hello,

I'm investigating an issue where spellcheck queries are tokenized without
being explicitly told to do so, resulting in suggestions such as
"www.www.product4sale.com.com" for the queries such as
"www.product4sale.com".

The default RegexFragmenter fragmenter (name="regex") uses the regular
expression:

[-\w ,/\n\"']{20,200}

I understand parts of it, but I'm not sure about the - sign, or the slash
midway through it.
I would like to perhaps tailor this regular expression to not cause query
terms such as "www.product4sale.com" to be broken down on the period marks,
but just be kept as they are.

Any suggestions or answers are highly appreciated!

Sebastian
-- 
View this message in context: http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message