cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Øie <k...@gan.no>
Subject RE: urgent encoding problem...
Date Thu, 13 Dec 2001 13:07:09 GMT
arg, ok you are using cocoon1, well in cocoon1 serializers are called formatters i think (i
have never used c1 myself), HTMLSerializer is called HTMLFormater and can be found at org.apache.cocoon.formatter.HTMLFormatter.

http://cvs.apache.org/viewcvs.cgi/xml-cocoon/src/org/apache/cocoon/formatter/

please note that it looks like the formatter in c1 is not using javax.xml directly, but rather
org.apache.xml api's.


mvh karl øie

  -----Original Message-----
  From: Arun.N [mailto:arun.n@eximsoft.com]
  Sent: 13. desember 2001 13:32
  To: cocoon-users@xml.apache.org; karl@gan.no
  Subject: Re: urgent encoding problem...


  Hi karl,
                  In cocoon 1.8.2 there is no XMLSerializer ot HTMLSerializer .. can you tell
which are the files which does the work of these in 1.8. 2  please........................
  reg
  Arun.N
                  
  ----- Original Message ----- 
    From: Karl Øie 
    To: cocoon-users@xml.apache.org 
    Sent: Wednesday, December 12, 2001 6:33 PM
    Subject: RE: urgent encoding problem...


    erm... that code snipple was from the XMLSerializer, not the HTMLSerializer as i wrote,
but the approach should be the same.. sorry!
     
    mvh karl øie
      -----Original Message-----
      From: Karl Øie [mailto:karl@gan.no]
      Sent: 12. desember 2001 13:51
      To: cocoon-users@xml.apache.org; Arun.N
      Subject: RE: urgent encoding problem...


      the increasing page size does not concern me (:-) because the serializer should write
directly to the response.getPrintWriter(). Then again, the serializer does not flush before
the end of the page, so users must wait till the page is finished.
       
      when it comes to the missing characters. you could try to create your own serializer,
let's take a look at the code for the HTMLSerializer (org.apache.cocoon.serialization.XMLSerializer);
       
      the method for the outputstream uses javax.xml to set the transformers properties
       
          public void setOutputStream(OutputStream out) {
              try {
                  super.setOutputStream(out);
                  this.handler = getTransformerFactory().newTransformerHandler();
                  format.put(OutputKeys.METHOD,"xml");
                  handler.setResult(new StreamResult(this.output));
                  handler.getTransformer().setOutputProperties(format);
                  this.setContentHandler(handler);
                  this.setLexicalHandler(handler);
              } catch (Exception e) {
                  getLogger().error("XMLSerializer.setOutputStream()", e);
                  throw new RuntimeException(e.toString());
              }
          }
       
       
      if you here force the transformer to use your encoding like this;
       
       
          public void setOutputStream(OutputStream out) {
              try {
                  super.setOutputStream(out);
                  this.handler = getTransformerFactory().newTransformerHandler();
                  format.put(OutputKeys.METHOD,"xml");

                  format.put(OutputKeys.ENCODING,"SHIFT_JIS");    <----- add this!!!!

                  handler.setResult(new StreamResult(this.output));
                  handler.getTransformer().setOutputProperties(format);
                  this.setContentHandler(handler);
                  this.setLexicalHandler(handler);
              } catch (Exception e) {
                  getLogger().error("XMLSerializer.setOutputStream()", e);
                  throw new RuntimeException(e.toString());
              }
          }
       
      and then recompile cocoon, now try to your page and tell me what happens. please also
read this note from the xalan faq
       
      http://xml.apache.org/xalan-j/usagepatterns.html#outputencoding
       
       
      mvh karl øie
       
       
        -----Original Message-----
        From: Arun.N [mailto:arun.n@eximsoft.com]
        Sent: 12. desember 2001 13:33
        To: cocoon-users@xml.apache.org; karl@gan.no
        Subject: Re: urgent encoding problem...


        Thank you karl. 
                I have fixed that problem and i have followed the method which you have said.

        every thing is being displayed properly but there is still a problem in the source
code. 
        if the string contains    ‚±‚¿‚ç‚É–|–óŒã‚Ì•¶Í‚ª•\Ž¦‚³‚ê‚Ü‚·B
        it is priniting こちらに翻訳後の文章が表示されます。
        but when i look into the source of the output html it is showing  &#12371;&#12385;&#12425;&#12395;&#32763;&#35379;&#24460;&#12398;&#25991;&#31456;&#12364;&#34920;&#31034;&#12373;&#12428;&#12414;&#12377;&#12290;
         
        if the source has characters like this also ‚±‚¿‚ç‚É–|–óŒã‚Ì•¶Í‚ª•\Ž¦‚³‚ê‚Ü‚·B
it will work fine and out put japanese characters will be the same . but why is cocoon processor
replacing everything into numbers. My concern here is, it is increasing the page size.
        any comments in this regard ???
        Thankx in advance,
        Arun.N
         
         
          ----- Original Message ----- 
          From: Karl Øie 
          To: cocoon-users@xml.apache.org ; Arun.N 
          Sent: Wednesday, December 12, 2001 5:44 PM
          Subject: RE: urgent encoding problem...


          it's not that people don't bother to answer you but a lot of people here don't have
any experience with shift-jis encoding. as a Norwegian I have the same problem, non Scandinavians
can hardly reproduce problems revolving Scandinavian-characters.
           
          when it comes to your string problem there can be several sources. first of all
you can test the dom by feeding it a string that has been created with a declared encoding,
like :
           
          new String( "æ e trønder æ å" ); will not work on all jdks/platforms
          new String( "æ e trønder æ å", "UTF-16" ); will work on most sane jdks/platforms

          try to create all your strings with shift_jis forced, just in case. second find
out weither StringWriter does support shift_jis, as far as i know StringWriter are working
on chars and strings and should support shift_jis if all strings fed to it is shift_jis created.
lastly there is some problems regarding the PrintWriter that the servlet api are using to
return serialized content to the browser, try to serialize to a file instead of to the browser,
if the file accepts shift_jis then you should look up fixes/gotchas regarding shift_jis and
jsp as cocoon are using the jsp mechanism to send the response back to the user.

          the best place to start looking is the xalan faqs and docs because if you use the
xml or html serializer it's using the xalan implementations.

          mvh karl øie



            -----Original Message-----
            From: Arun.N [mailto:arun.n@eximsoft.com]
            Sent: 12. desember 2001 12:48
            To: cocoon-users@xml.apache.org
            Subject: Re: urgent encoding problem...


            Hi all,
                        First of all i thank everybody for not bothering to reply. I corrected
the second and the third problem. If the list is still alive and anyone cares to give me solution
for the first problem please do reply.....
            thankx,
            Arun.N
             
              ----- Original Message ----- 
              From: Arun.N 
              To: cocoon-users@xml.apache.org 
              Sent: Tuesday, December 11, 2001 1:31 PM
              Subject: urgent encoding problem...


              Hi all,
                          I have some problems with the xsp pages and encoding. When i try
to display Shift_JIS encoded characters it is not displaying properly.
              when i hard code the japnese characters it is working properly. for example
in this xsp page
               
              <?xml version="1.0" encoding="Shift_JIS"?>
              <?cocoon-process type="xsp"?>
              <?cocoon-process type="xslt"?>
              <?xml-stylesheet href="xsl/viewMail-to-html.xsl" type="text/xsl" ?>
              <xsp:page
                language="java"
                encoding="Shift_JIS"
                xmlns:xsp="http://www.apache.org/1999/XSP/Core"
                xmlns:request="http://www.apache.org/1999/XSP/Request"
                xmlns:util="http://www.apache.org/1999/XSP/Util" 
               >
              <page>
                 <title>melpo View Mail</title>
                <body>
                      <label>‚ ‚È‚½‚ÌPC‚Ì’†‚̃[ƒ‹ƒNƒ‰ƒCƒAƒ“ƒg‚ªÄŠJ‚³‚ê‚Ü‚µ‚½B
</label>
                  </body>
              </xsp:page>
               
              the display html is working fine and the characters are working properly ..
but the source of the html shows 
              <html>
                  <body>
                  &#12354;&#12394;&#12383;&#12398;PC&#12398;&#20013;&#12398;&#12513;&#12540;&#12523;&#12463;&#12521;&#12452;&#12450;&#12531;&#12488;&#12364;&#20877;&#38283;&#12373;&#12428;&#12414;&#12375;&#12383;&#12290;

                  </body>
              </html>
              <!-- This page was served in 2278 milliseconds by Cocoon 1.8.2 -->
               
              but why is the characters converted into numbers. the problem i have here is
this consumes more bytes .. so if the device has some size limitations of the source of the
page then it is a problem. if the characters are left same way then it would consume less
bytes for the source page.
               
               
              The second problem is, when i dynamically include xml in my xsp it is not working.
But the same string when hardcode in the xsp page it is working fine.
              <?xml version="1.0" encoding="Shift_JIS"?>
              <?cocoon-process type="xsp"?>
              <?cocoon-process type="xslt"?>
              <?xml-stylesheet href="xsl/viewMail-to-html.xsl" type="text/xsl" ?>
              <xsp:page
                language="java"
                encoding="Shift_JIS"
                xmlns:xsp="http://www.apache.org/1999/XSP/Core"
                xmlns:request="http://www.apache.org/1999/XSP/Request"
                xmlns:util="http://www.apache.org/1999/XSP/Util" 
               >
              <page>
                 <title>melpo View Mail</title>
                <body>
                      <xsp:logic>
                           String xml = (String) request.getAttribute(xml);
                          <xsp:content>
                             <util:include-expr><util:expr>xml</util:expr></util:include-expr>
 // this will append an xml string like <label>‚ ‚È‚½‚ÌPC‚Ì’†‚̃[J‚³‚ê‚Ü‚µ‚½B
</label>          
                          </xsp:content>
                      </xsp:logic>
                  </body>
              </xsp:page>
               
              i am getting an error 
               
              org.xml.sax.SAXException: An invalid XML character (Unicode: 0x13) was found
in the element content of the document. [FATAL ERROR] [File: "null" Line: 1 Column: 109] (nested
exception: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13) was found
in the element content of the document.)
              But the string i am getting if hardcoded itzworking fine. because whrn i hardcode
it, the xsp page when getting compiled, it is converting all the characters to those numbers.
and whenever the string is dynamically included the it is not working..................................
               
              and the third problem is ,
                  when i load a string to a dom andthen get back the string the encoding information
is gone.The characers displayed are ???????????
                      String fullXml = "<?xml version=\"1.0\" encoding=\"Shift_JIS\"?><Response><Message>Mail
Client in your PC has been ƒƒOƒAƒEƒg Restarted ƒGƒLƒTƒCƒg : –|–󁄗˜—p‹K–ñ
xxx </Message></Response>";
               
                    DOMParser parser = new DOMParser();
                    InputStream is = new ByteArrayInputStream(fullXml.getBytes());
                    InputSource isource=new InputSource(is);
                    parser.parse(isource);
                    Document xmlDoc= parser.getDocument();       //created an dom
               ------------ doing some manipulation ------------------
                    OutputFormat    format  = new OutputFormat( xmlDoc );   //Serialize DOM
                    StringWriter  stringOut = new StringWriter();           //Writer will
be a String
                    XMLSerializer    serial = new XMLSerializer( stringOut, format );
                    serial.asDOMSerializer();                               // As a DOM Serializer
                    serial.serialize( xmlDoc.getDocumentElement() );
                    String returnXML = stringOut.toString();  // got back the xml as String.
               
              now if i display the string " returnXML " all the japanese characters are gone.
the output is only "???????????"
               
              Can any of you please give a solution for these problems, as it is very urgent
for me. I have been trying to solve theses isuues from past 2 days and have searched mail
archives i was not able to find a solution.
               
              Thankx in Advance
               
              regards,
              Arun.N,

Mime
View raw message