abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 20:04:08 GMT
hmmmm.
The Sun vs IBM JDK is worth a try...

On Sep 4, 2007, at 2:56 PM, James M Snell wrote:

> Heh.. figures, one platform I can't test.  I can confirm that I am not
> seeing this error at all on Windows or Ubuntu using the IBM JDK and
> Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.
>
> - James
>
> Chris Berry wrote:
>> Macbook Pro -- MAC OS-X 10.3
>>
>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
>> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
>> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
>>
>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java - 
>> version
>> java version "1.5.0_07"
>> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
>> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
>>
>> Thanks,
>> -- Chris 
>>
>> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
>>
>>> Hmmm... well, I ran your test cases and have not been able to  
>>> recreate
>>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5,  
>>> tried the
>>> latest woodstox and the stax parser that ships with Websphere,  
>>> and was
>>> completely unable to get the test to throw any kind of UTF-8 related
>>> errors.
>>>
>>> What operating system are you testing on?  What JDK?
>>>
>>> - James
>>>
>>> Chris Berry wrote:
>>>> I added the following JUnit (to the JIRA), which I think proves  
>>>> that
>>>> woodstox 3.2.1 is not the issue.
>>>> It passes fine (no Exceptions thrown).
>>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>>>> Cheers,
>>>> -- Chris
>>>> ===================================
>>>> package com.homeaway.hcdata.store.provider.blogs;
>>>>
>>>> import junit.framework.Test;
>>>> import junit.framework.TestCase;
>>>> import junit.framework.TestSuite;
>>>>
>>>> import javax.xml.stream.XMLStreamReader;
>>>> import javax.xml.stream.XMLInputFactory;
>>>>
>>>> import java.io.FileInputStream;
>>>>
>>>> import com.ctc.wstx.stax.WstxInputFactory;
>>>>
>>>> public class WoodstoxTest extends TestCase {
>>>>
>>>>     private static final String userdir = System.getProperty(
>>>> "user.dir" );
>>>>
>>>>     public static Test suite()
>>>>     { return new TestSuite( WoodstoxTest.class ); }
>>>>
>>>>     public void tearDown() throws Exception
>>>>     { super.tearDown(); }
>>>>
>>>>     public void setUp() throws Exception
>>>>     { super.tearDown(); }
>>>>
>>>>     public void testWoodstox() throws Exception {
>>>>
>>>>         String filename = userdir +
>>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>>>
>>>>         // we sill simply walk the doc and see if it throws an  
>>>> Exception
>>>>         XMLInputFactory xif = new WstxInputFactory();
>>>>         XMLStreamReader r = xif.createXMLStreamReader(new
>>>> FileInputStream( filename ));
>>>>         while (r.hasNext()) r.next();
>>>>     }
>>>> }
>>>>
>>>>
>>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>>>
>>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>>>> I have no idea whats causing this error, but I'm highly  
>>>>> doubting its
>>>>> woodstox. Woodstox is the most highly conformant xml parser out  
>>>>> there.
>>>>> (but I could be wrong)
>>>>>
>>>>> I would strongly suggest avoiding using 2.0.5 though for a  
>>>>> number of
>>>>> reasons
>>>>> - 3.x has many stax conformance improvements. AXIOM hasn't  
>>>>> really been
>>>>> tested with 2.x and it expects the stax api to react a certain way
>>>>> - 3.x is faster
>>>>> - 3.x has improved xml conformance
>>>>>
>>>>> I stepped through the test case a little and wasn't able to see  
>>>>> what
>>>>> was going right away. I would need to get the AXIOM sources to  
>>>>> really
>>>>> dig in more - I suspect the bug might lie in there after a  
>>>>> little bit
>>>>> of digging, but that is because thats the place I haven't  
>>>>> looked yet.
>>>>>
>>>>> Any chance you could catch the message being sent from the  
>>>>> server with
>>>>> something like TCPMon and post it to the JIRA issue?
>>>>>
>>>>> - Dan
>>>>>
>>>>> Chris Berry wrote:
>>>>> That fixes it!!!
>>>>>
>>>>> I modified all of the pertinent POMs accordingly;
>>>>> I.e.
>>>>> <!--
>>>>>       <dependency>
>>>>>         <groupId>org.codehaus.woodstox</groupId>
>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>         <version>3.2.1</version>
>>>>>         <scope>runtime</scope>         </dependency>
>>>>> -->
>>>>>       <dependency>
>>>>>         <groupId>woodstox</groupId>
>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>         <version>2.0.5</version>
>>>>>         <scope>runtime</scope>         </dependency>
>>>>>
>>>>> 9 POMs were affected::
>>>>>
>>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$  
>>>>> find .
>>>>> -name "*.xml" | xargs grep woodstox
>>>>> ./extensions/gdata/pom.xml:
>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</

>>>>> groupId>
>>>>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</

>>>>> groupId>
>>>>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</

>>>>> groupId>
>>>>> ./extensions/media/pom.xml:
>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>> ./extensions/opensearch/pom.xml:
>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>> ./extensions/sharing/pom.xml:
>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>>>
>>>>> I will add this info to the JIRA.
>>>>>
>>>>> James,
>>>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>>>>
>>>>> FYI: we are using woodstox 3.2.1 in another project with these  
>>>>> exact
>>>>> same XMLs without problem??
>>>>>
>>>>> Thanks much,
>>>>> -- Chris
>>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>>>
>>>>> I will try that. I didn't before, because I wasn't sure that  
>>>>> the it
>>>>> wasn't required somehow internally...
>>>>>
>>>>> BTW: I ran these XML documents with the supposed invalid chars  
>>>>> thru 2
>>>>> different UTF-8 conversions as I read them from disk, before  
>>>>> putting
>>>>> them into the <content>
>>>>> And I also processed them with the Unix "iconv" utility
>>>>> So I am pretty darn sure that there are no invalid chars in there.
>>>>>
>>>>> Cheers,
>>>>> -- Chris
>>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>>>
>>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require*  
>>>>> the new
>>>>> version of woodstox.  If dropping down to an older version  
>>>>> addresses the
>>>>> issue, then we can explore that as a solution.
>>>>>
>>>>> - James
>>>>>
>>>>> Chris Berry wrote:
>>>>> Hmmm.
>>>>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing  
>>>>> the
>>>>> latest woodstox with Abdera
>>>>> Or more correctly, maven was bringing in some chained  
>>>>> dependencies --
>>>>> one of which brought in woodstox 3.2.1.
>>>>> Abdera was using woodstox 2.0.5 at that time.
>>>>> The problem went away when I corrected this problem....
>>>>>
>>>>> Note, if this is your problem, you can workaround it with the  
>>>>> maven
>>>>> <exclusions> element
>>>>> e.g.
>>>>>         <dependency>
>>>>>           <groupId>com.whatever</groupId>
>>>>>           <artifactId>foo</artifactId>
>>>>>           <version>1.2.3</version>
>>>>>           <exclusions>
>>>>>             <exclusion>
>>>>>               <groupId>org.codehaus.woodstox</groupId>
>>>>>               <artifactId>wstx-lgpl</artifactId>
>>>>>             </exclusion>
>>>>>           </exclusions>
>>>>>         </dependency>
>>>>>
>>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is  
>>>>> related to
>>>>> the woodstox upgrade....
>>>>>
>>>>> Cheers,
>>>>> -- Chris
>>>>>
>>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de>  
>>>>> wrote:
>>>>>
>>>>> Hi Chris!
>>>>>
>>>>> Thanks for your feedback!
>>>>>
>>>>> This is exactly the bug I am seeing.
>>>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>>>> encoding="UTF-8"?>,
>>>>> Incidentally, my code worked fine before a recent "svn up" and  
>>>>> it has
>>>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>>>
>>>>> If I understand your problem correctly, it occurs, if you parse an
>>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"),  
>>>>> right?
>>>>>
>>>>> Mine occurs, if I use an AbderaClient to create an entry on an
>>>>> external server, which is btw a proprietary closed-source- 
>>>>> thingi. The
>>>>> server then gives me the error-message, while he tries to parse my
>>>>> request.
>>>>>
>>>>> It seems that knowing that another person is seeing the issue
>>>>> confirms that the issue is on Abdera's side...
>>>>>
>>>>> I'm not sure, if we both encounter the same problem. My problem  
>>>>> occurs
>>>>> also with the AbderaClient 0.22. Yours occured only after  
>>>>> updating to
>>>>> 0.30-snapshot, right?
>>>>>
>>>>> I haven't the slightest idea, whether the problem lies in my  
>>>>> code, in
>>>>> the abdera-code or even in the server-code.
>>>>>
>>>>> My next test would be the creation of an atom-entry by hand  
>>>>> without
>>>>> the AbderaClient and provide an "<?xml version="1.0"
>>>>> encoding="UTF-8"?>" to check how the server reacts.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Herbert
>>>>>
>>>>>
>>>>> -- 
>>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>>>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/ 
>>>>> freemail
>>>>>
>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Dan Diephouse
>>>>> MuleSource
>>>>> http://mulesource.com | http://netzooid.com/blog
>>>>>
>>>>>
>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>>
>>>>
>>>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>

S'all good  ---   chriswberry at gmail dot com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message