ibatis-user-java mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Goodin <brandon.goo...@gmail.com>
Subject Re: Help needed...
Date Wed, 20 Apr 2005 16:44:12 GMT
For more super neato reading!

http://www.unicode.org/faq/utf_bom.html

On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> From the link you sent, it appears that UTF-8, UTF-16, and UTF-32 are
> all just varying ways of storing the same glyphs. The reason there are
> variations is because some systems need to process characters/glyphs
> in single byte increments, so UTF-8 is good for that. Then, some
> systems handle the double-byte UTF-16 just fine, so that's great ...
> it was the original format, anyhow. UTF-32 came about, apparently,
> because UTF-16 filled up, so they created 'surrogates' which is
> essentially two UTF-16 characters ... so, some systems can't
> understand these surrogates, so UTF-32 is a fixed representation of
> the full UCS-4 data space.
> 
> Now, most folks aren't using UTF-32 because its HUGE for data storage
> ... imagine all your databases doubling in size, effectly (if you were
> using UTF-16 already) - or quadrupling (if you used UTF-8 and you
> averaged 1.1 bytes or so for your representations). Big difference.
> 
> Anyhow, that's what I got out of the informative link Brandon sent ...
> its a good read, for sure. What I didn't get, is that any difference
> in *what* could be represented, exists between UTF-8, 16, and 32.
> 
> Cheers!
> 
> On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com> wrote:
> > I did work with a japanese site and we used Shift_JIS which is a UTF-8
> > extension. We would store Shift_JIS into the database but then we had
> > some issues reading the stored data from the database. The characters
> > were entered as Shift_JIS and stored as UCS-2 (UTF-16) in SQL Server.
> > We tried reading them straight from the database and displaying them
> > on screen without any byte encoding conversion. But, they wound up
> > looking all wrong. The browser did not handle the conversion properly.
> > We then read the data from the database and used the java
> > String.getBytes(String charSetName) method to reset the encoding.
> > However, the java String.getBytes method did not work properly. We
> > wound up writing our own conversion that was quite simple and
> > everything worked. So, as far as i know, all the glyph representations
> > that are available in UTF-8 are available to UTF-16 and it is possible
> > to convert back and forth between the two so long as a glyph does not
> > exceed UTF-8 glyph storage size. But, I think UTF-16 has the potential
> > to store more complex glyphs. Maybe i'm wrong. But, that is my
> > impression with all of this.
> >
> > Brandon
> >
> > On 4/20/05, Miquel Angel Bada Zuazo <mabada@gmail.com> wrote:
> > > UTF-8 is for almost all languajes (uses 8 bits for representing a
> > > letter I think), but "complicated" languajes as Japanese and Thailand
> > > uses 16 bits, so that's because of UTF-16 overall.
> > >
> > > Miquel Angel
> > >
> > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com> wrote:
> > > > I've done quite a bit with i18n working between UTF-8 and UTF-16. Even
> > > > after all that... I'm still mystified. :D Encoding is a world unto
> > > > itself. All i want is something that works :) Maybe one of these days
> > > > i'll understand more... for now it's all about trial and error.
> > > >
> > > > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > > > I don't see anywhere in there that UTF-8 cannot encode everything
that
> > > > > UTF-16 and UTF-32 can ... just that the storage requirements differ
?!
> > > > >
> > > > > Brice
> > > > >
> > > > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com> wrote:
> > > > > > http://icu.sourceforge.net/docs/papers/forms_of_unicode/
> > > > > >
> > > > > > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > > > > > I had heard that chinese does a lot with UTF-16, but I
hadn't heard
> > > > > > > about arabic ... and I don't exactly understand why UTF-8
doesn't
> > > > > > > support that ... is it simply because their character sets
keep
> > > > > > > expanding and UTF-8 is static?
> > > > > > >
> > > > > > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com>
wrote:
> > > > > > > > Latin characters are fine. Howeve, UTF-8 is not sufficient
for several
> > > > > > > > languages like Arabic and Chinese. For their FULL
range of character
> > > > > > > > representaions these languages require UTF-16 and
in the case of
> > > > > > > > Chinese it is pushing for UTF-32.
> > > > > > > >
> > > > > > > > Brandon
> > > > > > > >
> > > > > > > > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > > > > > > > OK ... that's more reasonable. Obviously, you
need to use an editor
> > > > > > > > > (such as Eclipse) that is capable of editing
UTF-8 files, otherwise,
> > > > > > > > > you'll get junk and that won't be fun.
> > > > > > > > >
> > > > > > > > > Whew ... glad UTF-8 isn't compromised :)
> > > > > > > > >
> > > > > > > > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com>
wrote:
> > > > > > > > > > I found this quote when doing a search in
google:
> > > > > > > > > >
> > > > > > > > > > --- quote ---
> > > > > > > > > >
> > > > > > > > > > Your actual problem is very typical. By
default (without encoding
> > > > > > > > > > specified in the XML declaration), XML is
encoded in UTF-8. If you use
> > > > > > > > > > an editor which is not encoding-aware and
typically assuming an
> > > > > > > > > > ISO-8859-1 encoding, and you insert characters
such as accented
> > > > > > > > > > letters, curly quotes, etc., you will get
this error. As a workaround,
> > > > > > > > > > you can put an XML declaration with the
ISO-8859-1 encoding at the top
> > > > > > > > > > of your XML file:
> > > > > > > > > >
> > > > > > > > > > <?xml version="1.0" encoding="ISO-8859-1"?>
> > > > > > > > > >
> > > > > > > > > > You can also use an editor which knows how
to handle UTF-8.
> > > > > > > > > >
> > > > > > > > > > In your case it is also possible that somebody
inserted incorrect
> > > > > > > > > > characters by accident, and you can just
remove those and then decide
> > > > > > > > > > which encoding you want to use. UTF-8 gives
you the whole range of
> > > > > > > > > > Unicode, while ISO-8859-1 gives you a limited
set of characters that
> > > > > > > > > > work for the Western languages.
> > > > > > > > > >
> > > > > > > > > > --- quote ---
> > > > > > > > > >
> > > > > > > > > > maybe that will help,
> > > > > > > > > > Brandon
> > > > > > > > > >
> > > > > > > > > > On 4/20/05, Brice Ruth <bdruth@gmail.com>
wrote:
> > > > > > > > > > > What special characters aren't supported
by UTF-8?! I have never heard
> > > > > > > > > > > of such a thing. My understanding is
that UTF-8 represents the full
> > > > > > > > > > > Unicode character set as a multi-byte
value. And since Unicode is
> > > > > > > > > > > supposed to encompass all known characters
for all known languages
> > > > > > > > > > > (with space for new Chinese characters
created daily) - what's not
> > > > > > > > > > > covered?!
> > > > > > > > > > >
> > > > > > > > > > > There most certainly shouldn't be anything
that iso-8859-1 or latin1
> > > > > > > > > > > (Windows-1252) covers that is not in
Unicode.
> > > > > > > > > > >
> > > > > > > > > > > Brice
> > > > > > > > > > >
> > > > > > > > > > > On 4/20/05, Daniel H. F. e Silva <dhfs@yahoo.com>
wrote:
> > > > > > > > > > > > You could check also your xml
encoding. If you work with special charaters not in utf-8, you will
> > > > > > > > > > > > get in trouble.
> > > > > > > > > > > > I had this as my native language
is portuguese and we have some special characters not supported
> > > > > > > > > > > > by utf-8.
> > > > > > > > > > > > So, if this is your case, try
iso-8859-1 or one that fits better to your needs.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > >  Daniel Silva.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --- Larry Meadors <larry.meadors@gmail.com>
wrote:
> > > > > > > > > > > > > Make sure that there is no
white space and no odd chars at the top of your
> > > > > > > > > > > > > config file.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Larry
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 4/18/05, KK <kkn006@gmail.com>
wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I get the following
error when I try to build sqlCOnfigmap..does it
> > > > > > > > > > > > > > look familiar to someone?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > com.ibatis.sqlmap.client.SqlMapException:
There was an error while
> > > > > > > > > > > > > > building the SqlMap
instance.
> > > > > > > > > > > > > > --- The error occurred
in the SQL Map Configuration file.
> > > > > > > > > > > > > > --- Cause: com.ibatis.sqlmap.client.SqlMapException:
XML Parser Error.
> > > > > > > > > > > > > > Cause: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte UTF-8
> > > > > > > > > > > > > > sequence.
> > > > > > > > > > > > > > Caused by: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte
> > > > > > > > > > > > > > UTF-8 sequence.
> > > > > > > > > > > > > > Caused by: com.ibatis.sqlmap.client.SqlMapException:
XML Parser Error.
> > > > > > > > > > > > > > Cause: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte UTF-8
> > > > > > > > > > > > > > sequence.
> > > > > > > > > > > > > > Caused by: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte
> > > > > > > > > > > > > > UTF-8 sequence.
> > > > > > > > > > > > > > at com.ibatis.sqlmap.engine.builder.xml.XmlSqlMapClientBuilder.buildSqlMap
> > > > > > > > > > > > > > (XmlSqlMapClientBuilder.java:203)
> > > > > > > > > > > > > > at com.ibatis.sqlmap.client.
> > > > > > > > > > > > > > SqlMapClientBuilder.buildSqlMapClient(SqlMapClientBuilder.java:49)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Your help is greatly
appreciated.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > KK
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > __________________________________________________
> > > > > > > > > > > > Do You Yahoo!?
> > > > > > > > > > > > Tired of spam?  Yahoo! Mail has
the best spam protection around
> > > > > > > > > > > > http://mail.yahoo.com
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Brice Ruth
> > > > > > > > > > > Software Engineer, Madison WI
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Brice Ruth
> > > > > > > > > Software Engineer, Madison WI
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Brice Ruth
> > > > > > > Software Engineer, Madison WI
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Brice Ruth
> > > > > Software Engineer, Madison WI
> > > > >
> > > >
> > >
> >
> 
> --
> Brice Ruth
> Software Engineer, Madison WI
>

Mime
View raw message