abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 18:23:40 GMT
I added the following JUnit (to the JIRA), which I think proves that  
woodstox 3.2.1 is not the issue.
It passes fine (no Exceptions thrown).
So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
Cheers,
-- Chris 

===================================
package com.homeaway.hcdata.store.provider.blogs;

import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;

import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLInputFactory;

import java.io.FileInputStream;

import com.ctc.wstx.stax.WstxInputFactory;

public class WoodstoxTest extends TestCase {

     private static final String userdir = System.getProperty 
( "user.dir" );

     public static Test suite()
     { return new TestSuite( WoodstoxTest.class ); }

     public void tearDown() throws Exception
     { super.tearDown(); }

     public void setUp() throws Exception
     { super.tearDown(); }

     public void testWoodstox() throws Exception {

         String filename = userdir + "/var/blogs/cberry/99/9999/en/ 
blog_9999.xml" ;

         // we sill simply walk the doc and see if it throws an  
Exception
         XMLInputFactory xif = new WstxInputFactory();
         XMLStreamReader r = xif.createXMLStreamReader(new  
FileInputStream( filename ));
         while (r.hasNext()) r.next();
     }
}


On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:

> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
> I have no idea whats causing this error, but I'm highly doubting  
> its woodstox. Woodstox is the most highly conformant xml parser out  
> there. (but I could be wrong)
>
> I would strongly suggest avoiding using 2.0.5 though for a number  
> of reasons
> - 3.x has many stax conformance improvements. AXIOM hasn't really  
> been tested with 2.x and it expects the stax api to react a certain  
> way
> - 3.x is faster
> - 3.x has improved xml conformance
>
> I stepped through the test case a little and wasn't able to see  
> what was going right away. I would need to get the AXIOM sources to  
> really dig in more - I suspect the bug might lie in there after a  
> little bit of digging, but that is because thats the place I  
> haven't looked yet.
>
> Any chance you could catch the message being sent from the server  
> with something like TCPMon and post it to the JIRA issue?
>
> - Dan
>
> Chris Berry wrote:
> That fixes it!!!
>
> I modified all of the pertinent POMs accordingly;
> I.e.
> <!--
>       <dependency>
>         <groupId>org.codehaus.woodstox</groupId>
>         <artifactId>wstx-asl</artifactId>
>         <version>3.2.1</version>
>         <scope>runtime</scope>         </dependency>
> -->
>       <dependency>
>         <groupId>woodstox</groupId>
>         <artifactId>wstx-asl</artifactId>
>         <version>2.0.5</version>
>         <scope>runtime</scope>         </dependency>
>
> 9 POMs were affected::
>
> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$  
> find . -name "*.xml" | xargs grep woodstox
> ./extensions/gdata/pom.xml:      <groupId>org.codehaus.woodstox</ 
> groupId>
> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</ 
> groupId>
> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</ 
> groupId>
> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</ 
> groupId>
> ./extensions/media/pom.xml:      <groupId>org.codehaus.woodstox</ 
> groupId>
> ./extensions/opensearch/pom.xml:       
> <groupId>org.codehaus.woodstox</groupId>
> ./extensions/sharing/pom.xml:      <groupId>org.codehaus.woodstox</ 
> groupId>
> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>
> I will add this info to the JIRA.
>
> James,
> Can we move the SVN Head back to 2.0.5 until this is resolved??
>
> FYI: we are using woodstox 3.2.1 in another project with these  
> exact same XMLs without problem??
>
> Thanks much,
> -- Chris
> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>
> I will try that. I didn't before, because I wasn't sure that the it  
> wasn't required somehow internally...
>
> BTW: I ran these XML documents with the supposed invalid chars thru  
> 2 different UTF-8 conversions as I read them from disk, before  
> putting them into the <content>
> And I also processed them with the Unix "iconv" utility
> So I am pretty darn sure that there are no invalid chars in there.
>
> Cheers,
> -- Chris
> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>
> Well, FWIW, there are no changes in Abdera 0.3.0 that *require* the  
> new
> version of woodstox.  If dropping down to an older version  
> addresses the
> issue, then we can explore that as a solution.
>
> - James
>
> Chris Berry wrote:
> Hmmm.
> FYI:  I saw a similar problem with an earlier 0.3. I was mixing the
> latest woodstox with Abdera
> Or more correctly, maven was bringing in some chained dependencies --
> one of which brought in woodstox 3.2.1.
> Abdera was using woodstox 2.0.5 at that time.
> The problem went away when I corrected this problem....
>
> Note, if this is your problem, you can workaround it with the maven
> <exclusions> element
> e.g.
>         <dependency>
>           <groupId>com.whatever</groupId>
>           <artifactId>foo</artifactId>
>           <version>1.2.3</version>
>           <exclusions>
>             <exclusion>
>               <groupId>org.codehaus.woodstox</groupId>
>               <artifactId>wstx-lgpl</artifactId>
>             </exclusion>
>           </exclusions>
>         </dependency>
>
> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is  
> related to
> the woodstox upgrade....
>
> Cheers,
> -- Chris
>
> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de wrote:
>
> Hi Chris!
>
> Thanks for your feedback!
>
> This is exactly the bug I am seeing.
> AFAICT, it is not related to a missing <?xml version="1.0"
> encoding="UTF-8"?>,
> Incidentally, my code worked fine before a recent "svn up" and it has
> no <?xml version="1.0" encoding="UTF-8"?>,
>
> If I understand your problem correctly, it occurs, if you parse an
> entry with an AbderaClient (i.e. calling "entry.getContent()"), right?
>
> Mine occurs, if I use an AbderaClient to create an entry on an
> external server, which is btw a proprietary closed-source-thingi. The
> server then gives me the error-message, while he tries to parse my
> request.
>
> It seems that knowing that another person is seeing the issue
> confirms that the issue is on Abdera's side...
>
> I'm not sure, if we both encounter the same problem. My problem occurs
> also with the AbderaClient 0.22. Yours occured only after updating to
> 0.30-snapshot, right?
>
> I haven't the slightest idea, whether the problem lies in my code, in
> the abdera-code or even in the server-code.
>
> My next test would be the creation of an atom-entry by hand without
> the AbderaClient and provide an "<?xml version="1.0"
> encoding="UTF-8"?>" to check how the server reacts.
>
> Regards,
>
> Herbert
>
>
> -- 
> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
>
> S'all good  ---   chriswberry at gmail dot com
>
>
>
>
>
> S'all good  ---   chriswberry at gmail dot com
>
>
>
>
> S'all good  ---   chriswberry at gmail dot com
>
>
>
>
>
>
> -- 
> Dan Diephouse
> MuleSource
> http://mulesource.com | http://netzooid.com/blog
>
>
> S'all good  ---   chriswberry at gmail dot com
>

S'all good  ---   chriswberry at gmail dot com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message