Chuck Williams wrote: > For lazy fields, there would be a substantial benefit to having the > count on a String be an encoded byte count rather than a Java char > count, but this has the same problem. If there is a way to beat this > problem, then I'd start arguing for a byte count. I think the way to beat it is to keep things as bytes as long as possible. For example, each term in a Query needs to be converted from String to byte[], but after that all search computation could happen comparing byte arrays. (Note that lexicographic comparisons of UTF-8 encoded bytes give the same results as lexicographic comparisions of Unicode character strings.) And, when indexing, each Token would need to be converted from String to byte[] just once. The Java API can easily be made back-compatible. The harder part would be making the file format back-compatible. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org