lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: which unicode version is supported with lucene
Date Fri, 25 Feb 2011 13:06:18 GMT
On Fri, Feb 25, 2011 at 6:04 AM, Simon Willnauer <> wrote:

> Since 3.0 is a Java Generics / move to Java 1.5 only release these
> APIs are not in use yet in the latest released version. Lucene 3.1
> holds a largely converted Analyzer / TokenFilter / Tokenizer codebase
> (I think there are one or two which still have problems, I should
> check... Robert did we fix all NGram stuff?).
No... and honestly they have other serious problems (such as only looking at
first 1024 chars of input in the document, look at the jira issues). I
recommend against using them in general, but definitely if you have
codepoints outside of the BMP...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message