abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 19:56:27 GMT
Heh.. figures, one platform I can't test.  I can confirm that I am not
seeing this error at all on Windows or Ubuntu using the IBM JDK and
Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.

- James

Chris Berry wrote:
> Macbook Pro -- MAC OS-X 10.3
> 
> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
> 
> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -version
> java version "1.5.0_07"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
> 
> Thanks,
> -- Chris 
> 
> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
> 
>> Hmmm... well, I ran your test cases and have not been able to recreate
>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5, tried the
>> latest woodstox and the stax parser that ships with Websphere, and was
>> completely unable to get the test to throw any kind of UTF-8 related
>> errors.
>>
>> What operating system are you testing on?  What JDK?
>>
>> - James
>>
>> Chris Berry wrote:
>>> I added the following JUnit (to the JIRA), which I think proves that
>>> woodstox 3.2.1 is not the issue.
>>> It passes fine (no Exceptions thrown).
>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>>> Cheers,
>>> -- Chris
>>> ===================================
>>> package com.homeaway.hcdata.store.provider.blogs;
>>>
>>> import junit.framework.Test;
>>> import junit.framework.TestCase;
>>> import junit.framework.TestSuite;
>>>
>>> import javax.xml.stream.XMLStreamReader;
>>> import javax.xml.stream.XMLInputFactory;
>>>
>>> import java.io.FileInputStream;
>>>
>>> import com.ctc.wstx.stax.WstxInputFactory;
>>>
>>> public class WoodstoxTest extends TestCase {
>>>
>>>     private static final String userdir = System.getProperty(
>>> "user.dir" );
>>>
>>>     public static Test suite()
>>>     { return new TestSuite( WoodstoxTest.class ); }
>>>
>>>     public void tearDown() throws Exception
>>>     { super.tearDown(); }
>>>
>>>     public void setUp() throws Exception
>>>     { super.tearDown(); }
>>>
>>>     public void testWoodstox() throws Exception {
>>>
>>>         String filename = userdir +
>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>>
>>>         // we sill simply walk the doc and see if it throws an Exception
>>>         XMLInputFactory xif = new WstxInputFactory();
>>>         XMLStreamReader r = xif.createXMLStreamReader(new
>>> FileInputStream( filename ));
>>>         while (r.hasNext()) r.next();
>>>     }
>>> }
>>>
>>>
>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>>
>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>>> I have no idea whats causing this error, but I'm highly doubting its
>>>> woodstox. Woodstox is the most highly conformant xml parser out there.
>>>> (but I could be wrong)
>>>>
>>>> I would strongly suggest avoiding using 2.0.5 though for a number of
>>>> reasons
>>>> - 3.x has many stax conformance improvements. AXIOM hasn't really been
>>>> tested with 2.x and it expects the stax api to react a certain way
>>>> - 3.x is faster
>>>> - 3.x has improved xml conformance
>>>>
>>>> I stepped through the test case a little and wasn't able to see what
>>>> was going right away. I would need to get the AXIOM sources to really
>>>> dig in more - I suspect the bug might lie in there after a little bit
>>>> of digging, but that is because thats the place I haven't looked yet.
>>>>
>>>> Any chance you could catch the message being sent from the server with
>>>> something like TCPMon and post it to the JIRA issue?
>>>>
>>>> - Dan
>>>>
>>>> Chris Berry wrote:
>>>> That fixes it!!!
>>>>
>>>> I modified all of the pertinent POMs accordingly;
>>>> I.e.
>>>> <!--
>>>>       <dependency>
>>>>         <groupId>org.codehaus.woodstox</groupId>
>>>>         <artifactId>wstx-asl</artifactId>
>>>>         <version>3.2.1</version>
>>>>         <scope>runtime</scope>         </dependency>
>>>> -->
>>>>       <dependency>
>>>>         <groupId>woodstox</groupId>
>>>>         <artifactId>wstx-asl</artifactId>
>>>>         <version>2.0.5</version>
>>>>         <scope>runtime</scope>         </dependency>
>>>>
>>>> 9 POMs were affected::
>>>>
>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$ find .
>>>> -name "*.xml" | xargs grep woodstox
>>>> ./extensions/gdata/pom.xml:     
>>>> <groupId>org.codehaus.woodstox</groupId>
>>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>> ./extensions/media/pom.xml:     
>>>> <groupId>org.codehaus.woodstox</groupId>
>>>> ./extensions/opensearch/pom.xml:     
>>>> <groupId>org.codehaus.woodstox</groupId>
>>>> ./extensions/sharing/pom.xml:     
>>>> <groupId>org.codehaus.woodstox</groupId>
>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>>
>>>> I will add this info to the JIRA.
>>>>
>>>> James,
>>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>>>
>>>> FYI: we are using woodstox 3.2.1 in another project with these exact
>>>> same XMLs without problem??
>>>>
>>>> Thanks much,
>>>> -- Chris
>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>>
>>>> I will try that. I didn't before, because I wasn't sure that the it
>>>> wasn't required somehow internally...
>>>>
>>>> BTW: I ran these XML documents with the supposed invalid chars thru 2
>>>> different UTF-8 conversions as I read them from disk, before putting
>>>> them into the <content>
>>>> And I also processed them with the Unix "iconv" utility
>>>> So I am pretty darn sure that there are no invalid chars in there.
>>>>
>>>> Cheers,
>>>> -- Chris
>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>>
>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require* the new
>>>> version of woodstox.  If dropping down to an older version addresses the
>>>> issue, then we can explore that as a solution.
>>>>
>>>> - James
>>>>
>>>> Chris Berry wrote:
>>>> Hmmm.
>>>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing the
>>>> latest woodstox with Abdera
>>>> Or more correctly, maven was bringing in some chained dependencies --
>>>> one of which brought in woodstox 3.2.1.
>>>> Abdera was using woodstox 2.0.5 at that time.
>>>> The problem went away when I corrected this problem....
>>>>
>>>> Note, if this is your problem, you can workaround it with the maven
>>>> <exclusions> element
>>>> e.g.
>>>>         <dependency>
>>>>           <groupId>com.whatever</groupId>
>>>>           <artifactId>foo</artifactId>
>>>>           <version>1.2.3</version>
>>>>           <exclusions>
>>>>             <exclusion>
>>>>               <groupId>org.codehaus.woodstox</groupId>
>>>>               <artifactId>wstx-lgpl</artifactId>
>>>>             </exclusion>
>>>>           </exclusions>
>>>>         </dependency>
>>>>
>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is related to
>>>> the woodstox upgrade....
>>>>
>>>> Cheers,
>>>> -- Chris
>>>>
>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de> wrote:
>>>>
>>>> Hi Chris!
>>>>
>>>> Thanks for your feedback!
>>>>
>>>> This is exactly the bug I am seeing.
>>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>>> encoding="UTF-8"?>,
>>>> Incidentally, my code worked fine before a recent "svn up" and it has
>>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>>
>>>> If I understand your problem correctly, it occurs, if you parse an
>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"), right?
>>>>
>>>> Mine occurs, if I use an AbderaClient to create an entry on an
>>>> external server, which is btw a proprietary closed-source-thingi. The
>>>> server then gives me the error-message, while he tries to parse my
>>>> request.
>>>>
>>>> It seems that knowing that another person is seeing the issue
>>>> confirms that the issue is on Abdera's side...
>>>>
>>>> I'm not sure, if we both encounter the same problem. My problem occurs
>>>> also with the AbderaClient 0.22. Yours occured only after updating to
>>>> 0.30-snapshot, right?
>>>>
>>>> I haven't the slightest idea, whether the problem lies in my code, in
>>>> the abdera-code or even in the server-code.
>>>>
>>>> My next test would be the creation of an atom-entry by hand without
>>>> the AbderaClient and provide an "<?xml version="1.0"
>>>> encoding="UTF-8"?>" to check how the server reacts.
>>>>
>>>> Regards,
>>>>
>>>> Herbert
>>>>
>>>>
>>>> -- 
>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>>
>>>>
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Dan Diephouse
>>>> MuleSource
>>>> http://mulesource.com | http://netzooid.com/blog
>>>>
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>>
>>>
>>>
> 
> S'all good  ---   chriswberry at gmail dot com
> 
> 
> 

Mime
View raw message