lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From joanne.spros...@teamware.co.uk
Subject Re:Added comments to InputStream and OutputStrea m
Date Fri, 12 Oct 2001 10:13:00 GMT
I asked my colleague your question on Unicode & bytes - this was his reply :

Unicode is 16 bits.  UTF-8 needs 1 byte for a 7-bit character (ASCII),
2 bytes for an 11-bit character (including ISO-8859-1), and 3 bytes for
a 16-bit character.

       DaveS


Joanne



Dmitry Serebrennikov  (11/10/2001  18:44):
>I figured that I might as well be adding comments as I am reading and
>figuring out the code.
>One thing I was not clear on - characters are stored with 1 to 3 bytes.
>Is that sufficient to represent all Unicode characters? I thought
>Unicode was four bytes.
>
>Index: InputStream.java
>===================================================================
>RCS file:
>/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/store/InputStream.
java,v
>retrieving revision 1.1.1.1


Mime
View raw message