abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Diephouse <dan.diepho...@mulesource.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 21:07:11 GMT
I'll throw one more thing into the mix since someone mentioned 
"Readers". As I understand it, some Reader implementations are buggy on 
certain JDKs. I don't know if the code path uses a reader, but if it 
does it might be worth trying to switch it to use an InputStream instead.

- Dan

Chris Berry wrote:
> hmmmm.
> The Sun vs IBM JDK is worth a try...
>
> On Sep 4, 2007, at 2:56 PM, James M Snell wrote:
>
>> Heh.. figures, one platform I can't test.  I can confirm that I am not
>> seeing this error at all on Windows or Ubuntu using the IBM JDK and
>> Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.
>>
>> - James
>>
>> Chris Berry wrote:
>>> Macbook Pro -- MAC OS-X 10.3
>>>
>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
>>> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
>>> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
>>>
>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -version
>>> java version "1.5.0_07"
>>> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
>>> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
>>>
>>> Thanks,
>>> -- Chris
>>> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
>>>
>>>> Hmmm... well, I ran your test cases and have not been able to recreate
>>>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5, 
>>>> tried the
>>>> latest woodstox and the stax parser that ships with Websphere, and was
>>>> completely unable to get the test to throw any kind of UTF-8 related
>>>> errors.
>>>>
>>>> What operating system are you testing on?  What JDK?
>>>>
>>>> - James
>>>>
>>>> Chris Berry wrote:
>>>>> I added the following JUnit (to the JIRA), which I think proves that
>>>>> woodstox 3.2.1 is not the issue.
>>>>> It passes fine (no Exceptions thrown).
>>>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>>>>> Cheers,
>>>>> -- Chris
>>>>> ===================================
>>>>> package com.homeaway.hcdata.store.provider.blogs;
>>>>>
>>>>> import junit.framework.Test;
>>>>> import junit.framework.TestCase;
>>>>> import junit.framework.TestSuite;
>>>>>
>>>>> import javax.xml.stream.XMLStreamReader;
>>>>> import javax.xml.stream.XMLInputFactory;
>>>>>
>>>>> import java.io.FileInputStream;
>>>>>
>>>>> import com.ctc.wstx.stax.WstxInputFactory;
>>>>>
>>>>> public class WoodstoxTest extends TestCase {
>>>>>
>>>>>     private static final String userdir = System.getProperty(
>>>>> "user.dir" );
>>>>>
>>>>>     public static Test suite()
>>>>>     { return new TestSuite( WoodstoxTest.class ); }
>>>>>
>>>>>     public void tearDown() throws Exception
>>>>>     { super.tearDown(); }
>>>>>
>>>>>     public void setUp() throws Exception
>>>>>     { super.tearDown(); }
>>>>>
>>>>>     public void testWoodstox() throws Exception {
>>>>>
>>>>>         String filename = userdir +
>>>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>>>>
>>>>>         // we sill simply walk the doc and see if it throws an 
>>>>> Exception
>>>>>         XMLInputFactory xif = new WstxInputFactory();
>>>>>         XMLStreamReader r = xif.createXMLStreamReader(new
>>>>> FileInputStream( filename ));
>>>>>         while (r.hasNext()) r.next();
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>>>>
>>>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>>>>> I have no idea whats causing this error, but I'm highly doubting
its
>>>>>> woodstox. Woodstox is the most highly conformant xml parser out 
>>>>>> there.
>>>>>> (but I could be wrong)
>>>>>>
>>>>>> I would strongly suggest avoiding using 2.0.5 though for a number
of
>>>>>> reasons
>>>>>> - 3.x has many stax conformance improvements. AXIOM hasn't really

>>>>>> been
>>>>>> tested with 2.x and it expects the stax api to react a certain way
>>>>>> - 3.x is faster
>>>>>> - 3.x has improved xml conformance
>>>>>>
>>>>>> I stepped through the test case a little and wasn't able to see what
>>>>>> was going right away. I would need to get the AXIOM sources to 
>>>>>> really
>>>>>> dig in more - I suspect the bug might lie in there after a little

>>>>>> bit
>>>>>> of digging, but that is because thats the place I haven't looked

>>>>>> yet.
>>>>>>
>>>>>> Any chance you could catch the message being sent from the server

>>>>>> with
>>>>>> something like TCPMon and post it to the JIRA issue?
>>>>>>
>>>>>> - Dan
>>>>>>
>>>>>> Chris Berry wrote:
>>>>>> That fixes it!!!
>>>>>>
>>>>>> I modified all of the pertinent POMs accordingly;
>>>>>> I.e.
>>>>>> <!--
>>>>>>       <dependency>
>>>>>>         <groupId>org.codehaus.woodstox</groupId>
>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>         <version>3.2.1</version>
>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>> -->
>>>>>>       <dependency>
>>>>>>         <groupId>woodstox</groupId>
>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>         <version>2.0.5</version>
>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>
>>>>>> 9 POMs were affected::
>>>>>>
>>>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$ 
>>>>>> find .
>>>>>> -name "*.xml" | xargs grep woodstox
>>>>>> ./extensions/gdata/pom.xml:
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./extensions/geo/pom.xml:      
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./extensions/json/pom.xml:      
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./extensions/main/pom.xml:      
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./extensions/media/pom.xml:
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./extensions/opensearch/pom.xml:
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./extensions/sharing/pom.xml:
>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>>>>
>>>>>> I will add this info to the JIRA.
>>>>>>
>>>>>> James,
>>>>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>>>>>
>>>>>> FYI: we are using woodstox 3.2.1 in another project with these exact
>>>>>> same XMLs without problem??
>>>>>>
>>>>>> Thanks much,
>>>>>> -- Chris
>>>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>>>>
>>>>>> I will try that. I didn't before, because I wasn't sure that the
it
>>>>>> wasn't required somehow internally...
>>>>>>
>>>>>> BTW: I ran these XML documents with the supposed invalid chars 
>>>>>> thru 2
>>>>>> different UTF-8 conversions as I read them from disk, before putting
>>>>>> them into the <content>
>>>>>> And I also processed them with the Unix "iconv" utility
>>>>>> So I am pretty darn sure that there are no invalid chars in there.
>>>>>>
>>>>>> Cheers,
>>>>>> -- Chris
>>>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>>>>
>>>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require* 
>>>>>> the new
>>>>>> version of woodstox.  If dropping down to an older version 
>>>>>> addresses the
>>>>>> issue, then we can explore that as a solution.
>>>>>>
>>>>>> - James
>>>>>>
>>>>>> Chris Berry wrote:
>>>>>> Hmmm.
>>>>>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing the
>>>>>> latest woodstox with Abdera
>>>>>> Or more correctly, maven was bringing in some chained 
>>>>>> dependencies --
>>>>>> one of which brought in woodstox 3.2.1.
>>>>>> Abdera was using woodstox 2.0.5 at that time.
>>>>>> The problem went away when I corrected this problem....
>>>>>>
>>>>>> Note, if this is your problem, you can workaround it with the maven
>>>>>> <exclusions> element
>>>>>> e.g.
>>>>>>         <dependency>
>>>>>>           <groupId>com.whatever</groupId>
>>>>>>           <artifactId>foo</artifactId>
>>>>>>           <version>1.2.3</version>
>>>>>>           <exclusions>
>>>>>>             <exclusion>
>>>>>>               <groupId>org.codehaus.woodstox</groupId>
>>>>>>               <artifactId>wstx-lgpl</artifactId>
>>>>>>             </exclusion>
>>>>>>           </exclusions>
>>>>>>         </dependency>
>>>>>>
>>>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is 
>>>>>> related to
>>>>>> the woodstox upgrade....
>>>>>>
>>>>>> Cheers,
>>>>>> -- Chris
>>>>>>
>>>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de>
wrote:
>>>>>>
>>>>>> Hi Chris!
>>>>>>
>>>>>> Thanks for your feedback!
>>>>>>
>>>>>> This is exactly the bug I am seeing.
>>>>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>>>>> encoding="UTF-8"?>,
>>>>>> Incidentally, my code worked fine before a recent "svn up" and it

>>>>>> has
>>>>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>>>>
>>>>>> If I understand your problem correctly, it occurs, if you parse an
>>>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"), 
>>>>>> right?
>>>>>>
>>>>>> Mine occurs, if I use an AbderaClient to create an entry on an
>>>>>> external server, which is btw a proprietary closed-source-thingi.

>>>>>> The
>>>>>> server then gives me the error-message, while he tries to parse my
>>>>>> request.
>>>>>>
>>>>>> It seems that knowing that another person is seeing the issue
>>>>>> confirms that the issue is on Abdera's side...
>>>>>>
>>>>>> I'm not sure, if we both encounter the same problem. My problem 
>>>>>> occurs
>>>>>> also with the AbderaClient 0.22. Yours occured only after 
>>>>>> updating to
>>>>>> 0.30-snapshot, right?
>>>>>>
>>>>>> I haven't the slightest idea, whether the problem lies in my 
>>>>>> code, in
>>>>>> the abdera-code or even in the server-code.
>>>>>>
>>>>>> My next test would be the creation of an atom-entry by hand without
>>>>>> the AbderaClient and provide an "<?xml version="1.0"
>>>>>> encoding="UTF-8"?>" to check how the server reacts.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Herbert
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>>>>>> Alle Infos und kostenlose Anmeldung: 
>>>>>> http://www.gmx.net/de/go/freemail
>>>>>>
>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Dan Diephouse
>>>>>> MuleSource
>>>>>> http://mulesource.com | http://netzooid.com/blog
>>>>>>
>>>>>>
>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>
>>>>>
>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>>
>>>
>
> S'all good  ---   chriswberry at gmail dot com
>
>
>
>


-- 
Dan Diephouse
MuleSource
http://mulesource.com | http://netzooid.com/blog


Mime
View raw message