lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re:Added comments to InputStream and OutputStrea m
Date Fri, 12 Oct 2001 10:13:00 GMT
I asked my colleague your question on Unicode & bytes - this was his reply :

Unicode is 16 bits.  UTF-8 needs 1 byte for a 7-bit character (ASCII),
2 bytes for an 11-bit character (including ISO-8859-1), and 3 bytes for
a 16-bit character.



Dmitry Serebrennikov  (11/10/2001  18:44):
>I figured that I might as well be adding comments as I am reading and
>figuring out the code.
>One thing I was not clear on - characters are stored with 1 to 3 bytes.
>Is that sufficient to represent all Unicode characters? I thought
>Unicode was four bytes.
>RCS file:
>retrieving revision

View raw message