db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: Japanese characters on the ASF Derby site garbled again
Date Tue, 11 Aug 2009 09:32:32 GMT
Myrna van Lunteren wrote:
> On Thu, Aug 6, 2009 at 3:24 AM, Kristian Waagan<Kristian.Waagan@sun.com> wrote:
>> Hello,
>> I noticed that the Japanese characters on the manuals page of the ASF Derby
>> site got garbled again in the last commit. I went in and backed out the
>> changes made to manuals/index.html. They should be visible shortly.
>> I think we have talked about this issue before, and couldn't really
>> determine if the problem was with the platform used to build the site or
>> with an environment setting. This is maybe something the next person to
>> update the site should watch out for, and we should consider adding
>> something in the instructions to help avoid this happening in the future.
>> Regards,
>> --
>> Kristian
> Hm,
> I think I put this in the instructions on
> http://wiki.apache.org/db-derby/DerbySnapshotOrRelease after the
> trouble last time, re "Update
> src/documentation/content/xdocs/manuals/index.xml: Add the link to the
> version's manuals (which you uploaded in the previous step)."
> --------------------------
>  Before checking in changes to the build/site/manuals/index.html, be
> careful to check for changes to other areas than those actually
> modified - especially the japanese characters; some builds may garble
> this.
> --------------------------

Thanks for adding this, Myrna.

> Maybe Kathey did not see this or did not see anything wrong with the
> index.html...
> Any suggestions on how to improve that? It's probably too vague?

By configuring your terminal to use UTF-8, you can easily see the 
garbled characters in the diff, as they pop up as question marks.

> Maybe a link to your commits, Kristian?

In this case I just backed out the last change for the 
manuals/index.html file:
Note that the commit mail sent out doesn't seem to use the correct 
encoding (could be my mail reader as well).

Maybe it would be better to use Unicode encodings?
That way, every editor should be able to deal with the HTML file, and 
nothing should mess up the Japanese text. On the other hand, Japanese 
people would most likely need to look at the file in a web browser to 
understand what the text says...

As an example, a character would probably look like "&#x4E0E;" (hex) or 
"&#19982;" (decimal). We could easily convert the inserted characters by 
using an editor capable of showing the corresponding Unicode values for 
the characters (I know how to do this in vim, probably works in many 
others too).

Lastly, I observe one BOM (byte order mark). Since the header says this 
is UTF-8, it shouldn't be required, right?

> Myrna

View raw message