abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 19:19:04 GMT
Hmmm... well, I ran your test cases and have not been able to recreate
the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5, tried the
latest woodstox and the stax parser that ships with Websphere, and was
completely unable to get the test to throw any kind of UTF-8 related
errors.

What operating system are you testing on?  What JDK?

- James

Chris Berry wrote:
> I added the following JUnit (to the JIRA), which I think proves that
> woodstox 3.2.1 is not the issue.
> It passes fine (no Exceptions thrown).
> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
> Cheers,
> -- Chris
> ===================================
> package com.homeaway.hcdata.store.provider.blogs;
> 
> import junit.framework.Test;
> import junit.framework.TestCase;
> import junit.framework.TestSuite;
> 
> import javax.xml.stream.XMLStreamReader;
> import javax.xml.stream.XMLInputFactory;
> 
> import java.io.FileInputStream;
> 
> import com.ctc.wstx.stax.WstxInputFactory;
> 
> public class WoodstoxTest extends TestCase {
> 
>     private static final String userdir = System.getProperty( "user.dir" );
> 
>     public static Test suite()
>     { return new TestSuite( WoodstoxTest.class ); }
> 
>     public void tearDown() throws Exception
>     { super.tearDown(); }
> 
>     public void setUp() throws Exception
>     { super.tearDown(); }
> 
>     public void testWoodstox() throws Exception {
> 
>         String filename = userdir +
> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
> 
>         // we sill simply walk the doc and see if it throws an Exception
>         XMLInputFactory xif = new WstxInputFactory();
>         XMLStreamReader r = xif.createXMLStreamReader(new
> FileInputStream( filename ));
>         while (r.hasNext()) r.next();
>     }
> }
> 
> 
> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
> 
>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>> I have no idea whats causing this error, but I'm highly doubting its
>> woodstox. Woodstox is the most highly conformant xml parser out there.
>> (but I could be wrong)
>>
>> I would strongly suggest avoiding using 2.0.5 though for a number of
>> reasons
>> - 3.x has many stax conformance improvements. AXIOM hasn't really been
>> tested with 2.x and it expects the stax api to react a certain way
>> - 3.x is faster
>> - 3.x has improved xml conformance
>>
>> I stepped through the test case a little and wasn't able to see what
>> was going right away. I would need to get the AXIOM sources to really
>> dig in more - I suspect the bug might lie in there after a little bit
>> of digging, but that is because thats the place I haven't looked yet.
>>
>> Any chance you could catch the message being sent from the server with
>> something like TCPMon and post it to the JIRA issue?
>>
>> - Dan
>>
>> Chris Berry wrote:
>> That fixes it!!!
>>
>> I modified all of the pertinent POMs accordingly;
>> I.e.
>> <!--
>>       <dependency>
>>         <groupId>org.codehaus.woodstox</groupId>
>>         <artifactId>wstx-asl</artifactId>
>>         <version>3.2.1</version>
>>         <scope>runtime</scope>         </dependency>
>> -->
>>       <dependency>
>>         <groupId>woodstox</groupId>
>>         <artifactId>wstx-asl</artifactId>
>>         <version>2.0.5</version>
>>         <scope>runtime</scope>         </dependency>
>>
>> 9 POMs were affected::
>>
>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$ find .
>> -name "*.xml" | xargs grep woodstox
>> ./extensions/gdata/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/media/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/opensearch/pom.xml:     
>> <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/sharing/pom.xml:     
>> <groupId>org.codehaus.woodstox</groupId>
>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>
>> I will add this info to the JIRA.
>>
>> James,
>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>
>> FYI: we are using woodstox 3.2.1 in another project with these exact
>> same XMLs without problem??
>>
>> Thanks much,
>> -- Chris
>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>
>> I will try that. I didn't before, because I wasn't sure that the it
>> wasn't required somehow internally...
>>
>> BTW: I ran these XML documents with the supposed invalid chars thru 2
>> different UTF-8 conversions as I read them from disk, before putting
>> them into the <content>
>> And I also processed them with the Unix "iconv" utility
>> So I am pretty darn sure that there are no invalid chars in there.
>>
>> Cheers,
>> -- Chris
>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>
>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require* the new
>> version of woodstox.  If dropping down to an older version addresses the
>> issue, then we can explore that as a solution.
>>
>> - James
>>
>> Chris Berry wrote:
>> Hmmm.
>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing the
>> latest woodstox with Abdera
>> Or more correctly, maven was bringing in some chained dependencies --
>> one of which brought in woodstox 3.2.1.
>> Abdera was using woodstox 2.0.5 at that time.
>> The problem went away when I corrected this problem....
>>
>> Note, if this is your problem, you can workaround it with the maven
>> <exclusions> element
>> e.g.
>>         <dependency>
>>           <groupId>com.whatever</groupId>
>>           <artifactId>foo</artifactId>
>>           <version>1.2.3</version>
>>           <exclusions>
>>             <exclusion>
>>               <groupId>org.codehaus.woodstox</groupId>
>>               <artifactId>wstx-lgpl</artifactId>
>>             </exclusion>
>>           </exclusions>
>>         </dependency>
>>
>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is related to
>> the woodstox upgrade....
>>
>> Cheers,
>> -- Chris
>>
>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de wrote:
>>
>> Hi Chris!
>>
>> Thanks for your feedback!
>>
>> This is exactly the bug I am seeing.
>> AFAICT, it is not related to a missing <?xml version="1.0"
>> encoding="UTF-8"?>,
>> Incidentally, my code worked fine before a recent "svn up" and it has
>> no <?xml version="1.0" encoding="UTF-8"?>,
>>
>> If I understand your problem correctly, it occurs, if you parse an
>> entry with an AbderaClient (i.e. calling "entry.getContent()"), right?
>>
>> Mine occurs, if I use an AbderaClient to create an entry on an
>> external server, which is btw a proprietary closed-source-thingi. The
>> server then gives me the error-message, while he tries to parse my
>> request.
>>
>> It seems that knowing that another person is seeing the issue
>> confirms that the issue is on Abdera's side...
>>
>> I'm not sure, if we both encounter the same problem. My problem occurs
>> also with the AbderaClient 0.22. Yours occured only after updating to
>> 0.30-snapshot, right?
>>
>> I haven't the slightest idea, whether the problem lies in my code, in
>> the abdera-code or even in the server-code.
>>
>> My next test would be the creation of an atom-entry by hand without
>> the AbderaClient and provide an "<?xml version="1.0"
>> encoding="UTF-8"?>" to check how the server reacts.
>>
>> Regards,
>>
>> Herbert
>>
>>
>> -- 
>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>>
>>
>> -- 
>> Dan Diephouse
>> MuleSource
>> http://mulesource.com | http://netzooid.com/blog
>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
> 
> S'all good  ---   chriswberry at gmail dot com
> 
> 
> 
> 

Mime
View raw message