abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 19:34:46 GMT
Macbook Pro -- MAC OS-X 10.3

dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23  
16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386

dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -version
java version "1.5.0_07"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)

Thanks,
-- Chris 

On Sep 4, 2007, at 2:19 PM, James M Snell wrote:

> Hmmm... well, I ran your test cases and have not been able to recreate
> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5,  
> tried the
> latest woodstox and the stax parser that ships with Websphere, and was
> completely unable to get the test to throw any kind of UTF-8 related
> errors.
>
> What operating system are you testing on?  What JDK?
>
> - James
>
> Chris Berry wrote:
>> I added the following JUnit (to the JIRA), which I think proves that
>> woodstox 3.2.1 is not the issue.
>> It passes fine (no Exceptions thrown).
>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>> Cheers,
>> -- Chris
>> ===================================
>> package com.homeaway.hcdata.store.provider.blogs;
>>
>> import junit.framework.Test;
>> import junit.framework.TestCase;
>> import junit.framework.TestSuite;
>>
>> import javax.xml.stream.XMLStreamReader;
>> import javax.xml.stream.XMLInputFactory;
>>
>> import java.io.FileInputStream;
>>
>> import com.ctc.wstx.stax.WstxInputFactory;
>>
>> public class WoodstoxTest extends TestCase {
>>
>>     private static final String userdir = System.getProperty 
>> ( "user.dir" );
>>
>>     public static Test suite()
>>     { return new TestSuite( WoodstoxTest.class ); }
>>
>>     public void tearDown() throws Exception
>>     { super.tearDown(); }
>>
>>     public void setUp() throws Exception
>>     { super.tearDown(); }
>>
>>     public void testWoodstox() throws Exception {
>>
>>         String filename = userdir +
>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>
>>         // we sill simply walk the doc and see if it throws an  
>> Exception
>>         XMLInputFactory xif = new WstxInputFactory();
>>         XMLStreamReader r = xif.createXMLStreamReader(new
>> FileInputStream( filename ));
>>         while (r.hasNext()) r.next();
>>     }
>> }
>>
>>
>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>
>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>> I have no idea whats causing this error, but I'm highly doubting its
>>> woodstox. Woodstox is the most highly conformant xml parser out  
>>> there.
>>> (but I could be wrong)
>>>
>>> I would strongly suggest avoiding using 2.0.5 though for a number of
>>> reasons
>>> - 3.x has many stax conformance improvements. AXIOM hasn't really  
>>> been
>>> tested with 2.x and it expects the stax api to react a certain way
>>> - 3.x is faster
>>> - 3.x has improved xml conformance
>>>
>>> I stepped through the test case a little and wasn't able to see what
>>> was going right away. I would need to get the AXIOM sources to  
>>> really
>>> dig in more - I suspect the bug might lie in there after a little  
>>> bit
>>> of digging, but that is because thats the place I haven't looked  
>>> yet.
>>>
>>> Any chance you could catch the message being sent from the server  
>>> with
>>> something like TCPMon and post it to the JIRA issue?
>>>
>>> - Dan
>>>
>>> Chris Berry wrote:
>>> That fixes it!!!
>>>
>>> I modified all of the pertinent POMs accordingly;
>>> I.e.
>>> <!--
>>>       <dependency>
>>>         <groupId>org.codehaus.woodstox</groupId>
>>>         <artifactId>wstx-asl</artifactId>
>>>         <version>3.2.1</version>
>>>         <scope>runtime</scope>         </dependency>
>>> -->
>>>       <dependency>
>>>         <groupId>woodstox</groupId>
>>>         <artifactId>wstx-asl</artifactId>
>>>         <version>2.0.5</version>
>>>         <scope>runtime</scope>         </dependency>
>>>
>>> 9 POMs were affected::
>>>
>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$  
>>> find .
>>> -name "*.xml" | xargs grep woodstox
>>> ./extensions/gdata/pom.xml:      <groupId>org.codehaus.woodstox</ 
>>> groupId>
>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</ 
>>> groupId>
>>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</ 
>>> groupId>
>>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</ 
>>> groupId>
>>> ./extensions/media/pom.xml:      <groupId>org.codehaus.woodstox</ 
>>> groupId>
>>> ./extensions/opensearch/pom.xml:
>>> <groupId>org.codehaus.woodstox</groupId>
>>> ./extensions/sharing/pom.xml:
>>> <groupId>org.codehaus.woodstox</groupId>
>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>
>>> I will add this info to the JIRA.
>>>
>>> James,
>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>>
>>> FYI: we are using woodstox 3.2.1 in another project with these exact
>>> same XMLs without problem??
>>>
>>> Thanks much,
>>> -- Chris
>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>
>>> I will try that. I didn't before, because I wasn't sure that the it
>>> wasn't required somehow internally...
>>>
>>> BTW: I ran these XML documents with the supposed invalid chars  
>>> thru 2
>>> different UTF-8 conversions as I read them from disk, before putting
>>> them into the <content>
>>> And I also processed them with the Unix "iconv" utility
>>> So I am pretty darn sure that there are no invalid chars in there.
>>>
>>> Cheers,
>>> -- Chris
>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>
>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require*  
>>> the new
>>> version of woodstox.  If dropping down to an older version  
>>> addresses the
>>> issue, then we can explore that as a solution.
>>>
>>> - James
>>>
>>> Chris Berry wrote:
>>> Hmmm.
>>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing the
>>> latest woodstox with Abdera
>>> Or more correctly, maven was bringing in some chained  
>>> dependencies --
>>> one of which brought in woodstox 3.2.1.
>>> Abdera was using woodstox 2.0.5 at that time.
>>> The problem went away when I corrected this problem....
>>>
>>> Note, if this is your problem, you can workaround it with the maven
>>> <exclusions> element
>>> e.g.
>>>         <dependency>
>>>           <groupId>com.whatever</groupId>
>>>           <artifactId>foo</artifactId>
>>>           <version>1.2.3</version>
>>>           <exclusions>
>>>             <exclusion>
>>>               <groupId>org.codehaus.woodstox</groupId>
>>>               <artifactId>wstx-lgpl</artifactId>
>>>             </exclusion>
>>>           </exclusions>
>>>         </dependency>
>>>
>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is  
>>> related to
>>> the woodstox upgrade....
>>>
>>> Cheers,
>>> -- Chris
>>>
>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de wrote:
>>>
>>> Hi Chris!
>>>
>>> Thanks for your feedback!
>>>
>>> This is exactly the bug I am seeing.
>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>> encoding="UTF-8"?>,
>>> Incidentally, my code worked fine before a recent "svn up" and it  
>>> has
>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>
>>> If I understand your problem correctly, it occurs, if you parse an
>>> entry with an AbderaClient (i.e. calling "entry.getContent()"),  
>>> right?
>>>
>>> Mine occurs, if I use an AbderaClient to create an entry on an
>>> external server, which is btw a proprietary closed-source-thingi.  
>>> The
>>> server then gives me the error-message, while he tries to parse my
>>> request.
>>>
>>> It seems that knowing that another person is seeing the issue
>>> confirms that the issue is on Abdera's side...
>>>
>>> I'm not sure, if we both encounter the same problem. My problem  
>>> occurs
>>> also with the AbderaClient 0.22. Yours occured only after  
>>> updating to
>>> 0.30-snapshot, right?
>>>
>>> I haven't the slightest idea, whether the problem lies in my  
>>> code, in
>>> the abdera-code or even in the server-code.
>>>
>>> My next test would be the creation of an atom-entry by hand without
>>> the AbderaClient and provide an "<?xml version="1.0"
>>> encoding="UTF-8"?>" to check how the server reacts.
>>>
>>> Regards,
>>>
>>> Herbert
>>>
>>>
>>> -- 
>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/ 
>>> freemail
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>>
>>>
>>>
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>>
>>>
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> Dan Diephouse
>>> MuleSource
>>> http://mulesource.com | http://netzooid.com/blog
>>>
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>

S'all good  ---   chriswberry at gmail dot com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message