lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: which unicode version is supported with lucene
Date Fri, 25 Feb 2011 13:06:18 GMT
On Fri, Feb 25, 2011 at 6:04 AM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> Since 3.0 is a Java Generics / move to Java 1.5 only release these
> APIs are not in use yet in the latest released version. Lucene 3.1
> holds a largely converted Analyzer / TokenFilter / Tokenizer codebase
> (I think there are one or two which still have problems, I should
> check... Robert did we fix all NGram stuff?).
>
>
No... and honestly they have other serious problems (such as only looking at
first 1024 chars of input in the document, look at the jira issues). I
recommend against using them in general, but definitely if you have
codepoints outside of the BMP...

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message