abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 18:49:56 GMT
It very well could be the java.io.Reader/InputStream stuff that
FOMParser introduces into the mix.  If we can assume that woodstox is
not the issue, that's the next place I would look.

- James

Chris Berry wrote:
> I added the following JUnit (to the JIRA), which I think proves that
> woodstox 3.2.1 is not the issue.
> It passes fine (no Exceptions thrown).
> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
> Cheers,
> -- Chris
> ===================================
> package com.homeaway.hcdata.store.provider.blogs;
> 
> import junit.framework.Test;
> import junit.framework.TestCase;
> import junit.framework.TestSuite;
> 
> import javax.xml.stream.XMLStreamReader;
> import javax.xml.stream.XMLInputFactory;
> 
> import java.io.FileInputStream;
> 
> import com.ctc.wstx.stax.WstxInputFactory;
> 
> public class WoodstoxTest extends TestCase {
> 
>     private static final String userdir = System.getProperty( "user.dir" );
> 
>     public static Test suite()
>     { return new TestSuite( WoodstoxTest.class ); }
> 
>     public void tearDown() throws Exception
>     { super.tearDown(); }
> 
>     public void setUp() throws Exception
>     { super.tearDown(); }
> 
>     public void testWoodstox() throws Exception {
> 
>         String filename = userdir +
> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
> 
>         // we sill simply walk the doc and see if it throws an Exception
>         XMLInputFactory xif = new WstxInputFactory();
>         XMLStreamReader r = xif.createXMLStreamReader(new
> FileInputStream( filename ));
>         while (r.hasNext()) r.next();
>     }
> }
> 
> 
> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
> 
>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>> I have no idea whats causing this error, but I'm highly doubting its
>> woodstox. Woodstox is the most highly conformant xml parser out there.
>> (but I could be wrong)
>>
>> I would strongly suggest avoiding using 2.0.5 though for a number of
>> reasons
>> - 3.x has many stax conformance improvements. AXIOM hasn't really been
>> tested with 2.x and it expects the stax api to react a certain way
>> - 3.x is faster
>> - 3.x has improved xml conformance
>>
>> I stepped through the test case a little and wasn't able to see what
>> was going right away. I would need to get the AXIOM sources to really
>> dig in more - I suspect the bug might lie in there after a little bit
>> of digging, but that is because thats the place I haven't looked yet.
>>
>> Any chance you could catch the message being sent from the server with
>> something like TCPMon and post it to the JIRA issue?
>>
>> - Dan
>>
>> Chris Berry wrote:
>> That fixes it!!!
>>
>> I modified all of the pertinent POMs accordingly;
>> I.e.
>> <!--
>>       <dependency>
>>         <groupId>org.codehaus.woodstox</groupId>
>>         <artifactId>wstx-asl</artifactId>
>>         <version>3.2.1</version>
>>         <scope>runtime</scope>         </dependency>
>> -->
>>       <dependency>
>>         <groupId>woodstox</groupId>
>>         <artifactId>wstx-asl</artifactId>
>>         <version>2.0.5</version>
>>         <scope>runtime</scope>         </dependency>
>>
>> 9 POMs were affected::
>>
>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$ find .
>> -name "*.xml" | xargs grep woodstox
>> ./extensions/gdata/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/media/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/opensearch/pom.xml:     
>> <groupId>org.codehaus.woodstox</groupId>
>> ./extensions/sharing/pom.xml:     
>> <groupId>org.codehaus.woodstox</groupId>
>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>
>> I will add this info to the JIRA.
>>
>> James,
>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>
>> FYI: we are using woodstox 3.2.1 in another project with these exact
>> same XMLs without problem??
>>
>> Thanks much,
>> -- Chris
>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>
>> I will try that. I didn't before, because I wasn't sure that the it
>> wasn't required somehow internally...
>>
>> BTW: I ran these XML documents with the supposed invalid chars thru 2
>> different UTF-8 conversions as I read them from disk, before putting
>> them into the <content>
>> And I also processed them with the Unix "iconv" utility
>> So I am pretty darn sure that there are no invalid chars in there.
>>
>> Cheers,
>> -- Chris
>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>
>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require* the new
>> version of woodstox.  If dropping down to an older version addresses the
>> issue, then we can explore that as a solution.
>>
>> - James
>>
>> Chris Berry wrote:
>> Hmmm.
>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing the
>> latest woodstox with Abdera
>> Or more correctly, maven was bringing in some chained dependencies --
>> one of which brought in woodstox 3.2.1.
>> Abdera was using woodstox 2.0.5 at that time.
>> The problem went away when I corrected this problem....
>>
>> Note, if this is your problem, you can workaround it with the maven
>> <exclusions> element
>> e.g.
>>         <dependency>
>>           <groupId>com.whatever</groupId>
>>           <artifactId>foo</artifactId>
>>           <version>1.2.3</version>
>>           <exclusions>
>>             <exclusion>
>>               <groupId>org.codehaus.woodstox</groupId>
>>               <artifactId>wstx-lgpl</artifactId>
>>             </exclusion>
>>           </exclusions>
>>         </dependency>
>>
>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is related to
>> the woodstox upgrade....
>>
>> Cheers,
>> -- Chris
>>
>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de wrote:
>>
>> Hi Chris!
>>
>> Thanks for your feedback!
>>
>> This is exactly the bug I am seeing.
>> AFAICT, it is not related to a missing <?xml version="1.0"
>> encoding="UTF-8"?>,
>> Incidentally, my code worked fine before a recent "svn up" and it has
>> no <?xml version="1.0" encoding="UTF-8"?>,
>>
>> If I understand your problem correctly, it occurs, if you parse an
>> entry with an AbderaClient (i.e. calling "entry.getContent()"), right?
>>
>> Mine occurs, if I use an AbderaClient to create an entry on an
>> external server, which is btw a proprietary closed-source-thingi. The
>> server then gives me the error-message, while he tries to parse my
>> request.
>>
>> It seems that knowing that another person is seeing the issue
>> confirms that the issue is on Abdera's side...
>>
>> I'm not sure, if we both encounter the same problem. My problem occurs
>> also with the AbderaClient 0.22. Yours occured only after updating to
>> 0.30-snapshot, right?
>>
>> I haven't the slightest idea, whether the problem lies in my code, in
>> the abdera-code or even in the server-code.
>>
>> My next test would be the creation of an atom-entry by hand without
>> the AbderaClient and provide an "<?xml version="1.0"
>> encoding="UTF-8"?>" to check how the server reacts.
>>
>> Regards,
>>
>> Herbert
>>
>>
>> -- 
>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>>
>>
>> -- 
>> Dan Diephouse
>> MuleSource
>> http://mulesource.com | http://netzooid.com/blog
>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
> 
> S'all good  ---   chriswberry at gmail dot com
> 
> 
> 
> 

Mime
View raw message