lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets
Date Sat, 12 May 2007 07:36:30 GMT

: After 1/2 hour of regex hacking... I think I'll stick with a two step
: process: split then trim ;)

But regex hacking is FUN!!

I'm 99% certain this does waht you want...

        <tokenizer class="solr.PatternTokenizerFactory"
                   pattern="((\A\s*)|\s*?(\s+-\s+|--|,|\(|\))|\s+)\s*\z?"

..if it doesn't send me an example string that it fails on and tell me
what hte desired output is.

Incidently, PatternTokenizerFactory seems to have the anoying limitation
of assuming there is a token prior to each match -- even if the match
explicitly matches on the start of the string (so it creates a 0 width
token) ... that seems like a bug right?




-Hoss


Mime
View raw message