tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Sudell <asud...@acm.org>
Subject Problem with passing japanese values to a servlet
Date Wed, 12 Jun 2002 14:56:16 GMT
mubariz kharbe writes:
 > Hi,
 > 
 > I am developing an internationalized web based application using Tomcat 3.1 on Windows
2000. I am facing the following problems
 > 
 > When I pass Japanese values to the servlet and I retreive the value using 
 > 	myData = httpservletrequest.getParameter("foo");
 > I get ?? in myData.
 > So I used
 > 	myDataNew = new String(myData.getBytes("ISO-8859-1"),"UTF-8");
 > This is the solution that is found at most forums I looked for.

That is the way to manually transcode data that came in as UTF-8 and
is the way most people have done it in Servlet 2.2.

 > I still get the value of myDataNew as ??.
 > I am able to get the correct value in myDataNew only after I boot
 > my server with default locale as Japanese. But I cannot do that
 > since my application is web based and needs to have support for all
 > the languages. So the server should necessarily be on English
 > OS. This is also the business requirement.
 > Question 1. What should be done so that running the server on English OS I will be able
to get the correct value in myData for all the languages, specially Japanese?

This isn't simple.  But I'll try to point you in a right direction
below.

 > 
 > The new tomcat 4.0.3 uses the Servlet Engine 2.3 in which there is a facility to set
the character encoding for the httpservletrequest. I upgraded my tomcat server to 4.0.3.
 > Now I tried using 
 > 	httpservletrequest.setCharacterEncoding("UTF-8");
 > 	myData = httpservletrequest.getParameter("foo");

This is a better way that is new to Servlet 2.3.  I'd suggest it in
preference to manual transcoding as in the above example.  It saves a 
bit of overhead by only doing one correct transcoding instead of
fouling it up, undoing it and then getting it right (3 transcodings).

 > And I am still getting the value of myData as ?? for Japanese values.
 > 
 > Question 2. Is there a problem in the way I am using httpservletrequest.setCharacterEncoding
method? What else is needed to be done?
 >

Not particularly.
 
 > Any advice will be greatly appreciated.
 > 

My first question is why do you believe the data being posted is UTF-8 
to begin with?  This is really less a question about servlets and java
than one about html forms and browsers.

The game is to get the browser to post the data in an encoding that
you can predict and to set the request encoding to that.

Above you mention that things workout when you set the default locale
to Japanese.  That makes me think you're getting the data posted in a
native Japanese encoding such as Shift-JIS or EUC, depending on the
platform.  If so transcoding it as UTF-8 won't work.

There are a couple of strategies that one can take.  A lot depends on
the languages, browsers, and client platforms you need to support as
well as how the application is structured and how the users use it.

The easiest thing is if you can do everything in a single encoding.
For example, if you only support English and Japanese, since English
is encodable in the Japanese native encodings, you could just use
those. If you have to support a wide range of languages, UTF-8 might
be a good answer, so long as you can be sure the browsers you support
will post the data back as UTF-8.  [I've never had to make that trick
work, but suspect that sending the pages as UTF-8 and/or setting the
acceptable content types on the form SHOULD do the trick.]

The other game you can play is to know what encoding you expect back
on a per form basis.  Basically this ends up being multiple sub-sites, 
one per encoding, for the application.  This can be done dynamically
or statically.  Staticly (copy pages and alter them) is "easier" but
harder to maintain as the number of encodings grows.  In this scenario 
you have to guess somehow what encoding each post is coming in as, or 
embed the information somewhere (in the url, in a hidden parameter, in 
the session, etc.)

There a a few good ideas at the end of this presentation
http://www.w3.org/Talks/1999/0830-tutorial-unicode-mjd/

Bottom line, there's no "right answer" to handling forms in a
completely internationalized site.  It would be nice if browsers
actually set the encoding on the content type of the posted data.
But I've yet to see one that did.  That forces the use of heuristics,
guesswork and silly kludges.

I've got a few other links that I've pulled together over time saved
off my home page at http://www.op.net/~asudell/info/i18n/index.html.
Some of those might help you too.

-- 
        Drew Sudell     asudell@acm.org      http://www.op.net/~asudell

--
To unsubscribe, e-mail:   <mailto:tomcat-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:tomcat-user-help@jakarta.apache.org>


Mime
View raw message