abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: more fun with character encodings
Date Sat, 08 Sep 2007 03:21:32 GMT
Character encodings are evil.  The conversion to UTF-8 likely is
happening, it's just not being done correctly.  Can you provide a bit
more context?  E.g. the code that is actually setting the title value.
Have you tried writing the string value out to a UTF-8 writer without
Abdera being in the mix?  e.g. what do you get when you do something like:

    Writer w = new OutputStreamWriter(System.out,"UTF-8");
    w.write(titlevalue);
    w.flush();

- James

Brian Moseley wrote:
> i'm running into a similar issue as was discussed earlier this week
> with regard to problem data.
> 
> as was mentioned earlier, it turns out that the os x native character
> encoding is MacRoman. well, it appears that even though both my mysql
> database and my jdbc connection are configured to use utf8, at some
> point the data taken from the db and inserted into an atom feed is
> turning up in MacRoman, even though the ResponseContext's content type
> is set to "application/atom+xml; charset=UTF-8".
> 
> from my re-reading of the various recent threads and my examining of
> the code in the 0.3.0 branch, it seems like the value i set for an
> entry's title (for instance) should be converted into utf8 while the
> entry is being serialized. but it's clearly not. when i look at the
> feed as it's fetched from my server by curl, in Terminal.app, the
> non-ascii character in the entry title is rendered using what i like
> to call the "wtf" glyph rather than the one that represents the actual
> character in question. and when i run the feed through the
> validome.org validator, it complains about this character being an
> invalid utf8 character.
> 
> when i run the server and database on linux and get a non-ascii
> character into the database,viewing the corresponding entry document
> in Terminal.app shows me the expected character, not the wtf one.
> 
> i've run through all of my code looking for places where we might be
> instantiating a Reader without specifying an encoding, but i can't
> find any. i'm using the 0.3.0-incubating jars that i deployed earlier
> today into the people.apache.org/m2-incubating-repository which
> contain the recent default encoding fixes. so i'm at a loss as to what
> could be going on. i feel like i'm missing something basic with regard
> to character encodings. any pointers?
> 
> for reference, here's a url for the entry document as served by os x.
> notice the final character of the title and summary are both the wtf
> character.
> 
> http://bcm.osafoundation.org:8080/chandler/atom/item/a05e2870-5cce-11dc-f4b0-84f152603f14?ticket=fnwrt8htw1
> 
> and here is what happens when i plug that url into validome's atom validator:
> 
> http://www.validome.org/rss-atom/validate?lang=en&url=http://bcm.osafoundation.org:8080/chandler/atom/item/a05e2870-5cce-11dc-f4b0-84f152603f14%3fticket=fnwrt8htw1&version=atom_1_0
> 
> thanks!
> 

Mime
View raw message