lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: Lucene does NOT use UTF-8.
Date Tue, 30 Aug 2005 18:04:06 GMT
Not true. You do not need to pre-scan it.

When you use CharSet encoder, it will write the bytes to a buffer (expanding
as needed). At the end of the encoding you can get the actual number of
bytes needed.

The pseudo-code is

use CharsetEncoder to write String to ByteBuffer
write VInt using ByteBuffer.getLength()
write bytes using ByteBuffer.getByte[]

better yet you NIO so you can pass the ByteBuffer directly.


-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com]
Sent: Tuesday, August 30, 2005 12:56 PM
To: java-dev@lucene.apache.org; rengels@ix.netcom.com
Subject: Re: Lucene does NOT use UTF-8.


> I think you guys are WAY overcomplicating things, or you just don't know
> enough about the Java class libraries.


People were just pointing out that if the vint isn't String.length(), then
one has to either buffer the entire string, or pre-scan it.

It's a valid point, and CharsetEncoder doesn't change that.

 -Yonik Now hiring -- http://tinyurl.com/7m67g


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message