lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Puffinburger" <ppuffinbur...@tlcdelivers.com>
Subject RE: 2.3.2 -> 2.4.0 StandardTokenizer issue
Date Sat, 21 Feb 2009 17:19:15 GMT
Thanks for the suggestion.   We're going to go over all of this information/suggestions next
week to see what we want to do.

-----Original Message-----
From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Saturday, February 21, 2009 11:52 AM
To: java-user@lucene.apache.org
Subject: Re: 2.3.2 -> 2.4.0 StandardTokenizer issue

that was just a suggestion as a quick hack...

it still won't really fix the problem because some character + accent
combinations don't have composed forms.

even if you added entire combining diacritical marks block to the jflex
grammar, its still wrong... what needs to be supported is \p{Word_Break =
Extend} property, etc etc.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message