abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Berry (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ABDERA-60) Invalid UTF-8 chars in the AbderaClient
Date Thu, 06 Sep 2007 00:21:34 GMT

    [ https://issues.apache.org/jira/browse/ABDERA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525263
] 

Chris Berry commented on ABDERA-60:
-----------------------------------

Reviewing my changes, this may not truly be an Abdera bug, 
although my code below does workaround the problem.                 

When we call;     

             FOMParser.parse( InputStream is,...  

it subsequently call Axiom's 

            StAXUtils.createXMLStreamReader(in, charset);     

(when there is a charset -- which there should be -- at least in my case)
This presumably should create a Reader with the proper charset??
But it definitely does not. So there is a bug somewhere in Axiom or possibly even Woodstox??

So what is happening is that the Reader (created by StAXUtils and subsequently Woodstox) 
uses the default encoding (MacRoman in my case)
Which is the reason why it works in Linux -- the default encoding is UTF-8.

I don't know what Herbert's default encoding is....

>>Would it be possible for you to put together a patch file with these
>>changes?

I would gladly produce a patch. 
BUT I really think you need to decide how to handle this.
When I call 

             FOMParser.parse( Reader rr,...  

This bypasses a bit of code. 

IMHO, I think that you should simply roll the required  "FOMParser.parse( InputStream is,..."
 code into  "FOMParser.parse( Reader rr,... "
And not rely on the underlying code to do the right thing.

Oh, and for the Content-Type header, the right thing to do is call the
getCharacterEncoding method on ClientResponse.  You will still need to
verify that the value specified for the parameter is correct

So this should be something like this.....

  public BaseResponseContext(T base, boolean chunked) {
    this.base = base;
    setStatus(200);
    setStatusText("OK");
    this.chunked = chunked;
    try {
           //  setContentType(getContentType().toString());
           setContentType(getContentType().toString() + "; charset=" + getCharacterEncoding()
);
    } catch (Exception e) {}
  }


> Invalid UTF-8 chars in the AbderaClient
> ---------------------------------------
>
>                 Key: ABDERA-60
>                 URL: https://issues.apache.org/jira/browse/ABDERA-60
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>         Environment: N/A
>            Reporter: Chris Berry
>             Fix For: 0.3.0
>
>         Attachments: abdera-utf8-bug.tar.gz
>
>
> After upgrading to the latest 0.3-SNAPSHOT SVN trunk (on ~8/27/2007)) from a 0.3-SNAPSHOT
download from a couple of months ago
> And after making all required modifications  (to catch up with all the API changes),
I am seeing "Invalid UTF-8"
> Note that these errors only occur in the AbderaClient when I call "entry.getContent()"

> I have attached a small, self-contained JUnit test case which reproduces/demonstrates
this issue.
> It runs and builds out-of-the-box (using mvn install).
> There is also a README.txt that details the output/issue
> This JUnit reproduces the error. It is as small as I could get it. 
> My Atom Store is based on a Store and StoreProvider (based on code I received from Ugo
Cei as a starting point)
> Note that all of the code in src/main/java is relatively fixed between the latest 0.3-SNAPSHOT
and the 0.3-SNAPSHOT that works 
> In other words, my code stayed as fixed as possible, and the latest 0.3-SNAPSHOT is the
only real variable
> I'm not saying that the bug isn't in my code, Only that it never showed up until my upgrade
to 0.3-SNAPSHOT.
> I actually suspect that it may be an issue w/ woodstox, which the latest 0.3-SNAPSHOT
significantly upgrades.
> Note: I have looked very closely at the XML file(s) that is causing this issue. 
> I used the Unix util; "iconv" on them. And AFAICT they do not contain improper UTF-8.
> Chris Berry
> chriswberry at gmail dot com

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message