lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Puffinburger" <>
Subject RE: 2.3.2 -> 2.4.0 StandardTokenizer issue
Date Sat, 21 Feb 2009 17:19:15 GMT
Thanks for the suggestion.   We're going to go over all of this information/suggestions next
week to see what we want to do.

-----Original Message-----
From: Robert Muir [] 
Sent: Saturday, February 21, 2009 11:52 AM
Subject: Re: 2.3.2 -> 2.4.0 StandardTokenizer issue

that was just a suggestion as a quick hack...

it still won't really fix the problem because some character + accent
combinations don't have composed forms.

even if you added entire combining diacritical marks block to the jflex
grammar, its still wrong... what needs to be supported is \p{Word_Break =
Extend} property, etc etc.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message