abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Wed, 05 Sep 2007 00:54:55 GMT
Darn!  I tried out JDK 1.6.0 on Mac OS-X 10.4 (Tiger) and it also  
fails....
I don't know if there is an IBM JDK for Macs (I couldn't find one...)

Not sure what to try next....
Thanks,
-- Chris 

On Sep 4, 2007, at 5:08 PM, Stephen Duncan wrote:

> I don't see the problem (i.e. I changed the one assertion you  
> mentioned in
> the comments & commented out the abdera-extensions dependency that  
> doesn't
> exist anymore, and the test passed).  I'm using:
>
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build 1.6.0-b105)
> Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
>
> On Kubuntu.
>
> -Stephen
>
> On 9/4/07, Chris Berry <chriswberry@gmail.com> wrote:
>>
>> hmmmm.
>> The Sun vs IBM JDK is worth a try...
>>
>> On Sep 4, 2007, at 2:56 PM, James M Snell wrote:
>>
>>> Heh.. figures, one platform I can't test.  I can confirm that I  
>>> am not
>>> seeing this error at all on Windows or Ubuntu using the IBM JDK and
>>> Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.
>>>
>>> - James
>>>
>>> Chris Berry wrote:
>>>> Macbook Pro -- MAC OS-X 10.3
>>>>
>>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
>>>> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed  
>>>> May 23
>>>> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
>>>>
>>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -
>>>> version
>>>> java version "1.5.0_07"
>>>> Java(TM) 2 Runtime Environment, Standard Edition (build  
>>>> 1.5.0_07-164)
>>>> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
>>>>
>>>> Thanks,
>>>> -- Chris
>>>>
>>>> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
>>>>
>>>>> Hmmm... well, I ran your test cases and have not been able to
>>>>> recreate
>>>>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5,
>>>>> tried the
>>>>> latest woodstox and the stax parser that ships with Websphere,
>>>>> and was
>>>>> completely unable to get the test to throw any kind of UTF-8  
>>>>> related
>>>>> errors.
>>>>>
>>>>> What operating system are you testing on?  What JDK?
>>>>>
>>>>> - James
>>>>>
>>>>> Chris Berry wrote:
>>>>>> I added the following JUnit (to the JIRA), which I think proves
>>>>>> that
>>>>>> woodstox 3.2.1 is not the issue.
>>>>>> It passes fine (no Exceptions thrown).
>>>>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>>>>>> Cheers,
>>>>>> -- Chris
>>>>>> ===================================
>>>>>> package com.homeaway.hcdata.store.provider.blogs;
>>>>>>
>>>>>> import junit.framework.Test;
>>>>>> import junit.framework.TestCase;
>>>>>> import junit.framework.TestSuite;
>>>>>>
>>>>>> import javax.xml.stream.XMLStreamReader;
>>>>>> import javax.xml.stream.XMLInputFactory;
>>>>>>
>>>>>> import java.io.FileInputStream;
>>>>>>
>>>>>> import com.ctc.wstx.stax.WstxInputFactory;
>>>>>>
>>>>>> public class WoodstoxTest extends TestCase {
>>>>>>
>>>>>>     private static final String userdir = System.getProperty(
>>>>>> "user.dir" );
>>>>>>
>>>>>>     public static Test suite()
>>>>>>     { return new TestSuite( WoodstoxTest.class ); }
>>>>>>
>>>>>>     public void tearDown() throws Exception
>>>>>>     { super.tearDown(); }
>>>>>>
>>>>>>     public void setUp() throws Exception
>>>>>>     { super.tearDown(); }
>>>>>>
>>>>>>     public void testWoodstox() throws Exception {
>>>>>>
>>>>>>         String filename = userdir +
>>>>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>>>>>
>>>>>>         // we sill simply walk the doc and see if it throws an
>>>>>> Exception
>>>>>>         XMLInputFactory xif = new WstxInputFactory();
>>>>>>         XMLStreamReader r = xif.createXMLStreamReader(new
>>>>>> FileInputStream( filename ));
>>>>>>         while (r.hasNext()) r.next();
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>>>>>
>>>>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>>>>>> I have no idea whats causing this error, but I'm highly
>>>>>>> doubting its
>>>>>>> woodstox. Woodstox is the most highly conformant xml parser out
>>>>>>> there.
>>>>>>> (but I could be wrong)
>>>>>>>
>>>>>>> I would strongly suggest avoiding using 2.0.5 though for a
>>>>>>> number of
>>>>>>> reasons
>>>>>>> - 3.x has many stax conformance improvements. AXIOM hasn't
>>>>>>> really been
>>>>>>> tested with 2.x and it expects the stax api to react a  
>>>>>>> certain way
>>>>>>> - 3.x is faster
>>>>>>> - 3.x has improved xml conformance
>>>>>>>
>>>>>>> I stepped through the test case a little and wasn't able to see
>>>>>>> what
>>>>>>> was going right away. I would need to get the AXIOM sources to
>>>>>>> really
>>>>>>> dig in more - I suspect the bug might lie in there after a
>>>>>>> little bit
>>>>>>> of digging, but that is because thats the place I haven't
>>>>>>> looked yet.
>>>>>>>
>>>>>>> Any chance you could catch the message being sent from the
>>>>>>> server with
>>>>>>> something like TCPMon and post it to the JIRA issue?
>>>>>>>
>>>>>>> - Dan
>>>>>>>
>>>>>>> Chris Berry wrote:
>>>>>>> That fixes it!!!
>>>>>>>
>>>>>>> I modified all of the pertinent POMs accordingly;
>>>>>>> I.e.
>>>>>>> <!--
>>>>>>>       <dependency>
>>>>>>>         <groupId>org.codehaus.woodstox</groupId>
>>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>>         <version>3.2.1</version>
>>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>> -->
>>>>>>>       <dependency>
>>>>>>>         <groupId>woodstox</groupId>
>>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>>         <version>2.0.5</version>
>>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>>
>>>>>>> 9 POMs were affected::
>>>>>>>
>>>>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$
>>>>>>> find .
>>>>>>> -name "*.xml" | xargs grep woodstox
>>>>>>> ./extensions/gdata/pom.xml:
>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>> groupId>
>>>>>>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>> groupId>
>>>>>>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>> groupId>
>>>>>>> ./extensions/media/pom.xml:
>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>> ./extensions/opensearch/pom.xml:
>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>> ./extensions/sharing/pom.xml:
>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>>>>>
>>>>>>> I will add this info to the JIRA.
>>>>>>>
>>>>>>> James,
>>>>>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>>>>>>
>>>>>>> FYI: we are using woodstox 3.2.1 in another project with these
>>>>>>> exact
>>>>>>> same XMLs without problem??
>>>>>>>
>>>>>>> Thanks much,
>>>>>>> -- Chris
>>>>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>>>>>
>>>>>>> I will try that. I didn't before, because I wasn't sure that
>>>>>>> the it
>>>>>>> wasn't required somehow internally...
>>>>>>>
>>>>>>> BTW: I ran these XML documents with the supposed invalid chars
>>>>>>> thru 2
>>>>>>> different UTF-8 conversions as I read them from disk, before
>>>>>>> putting
>>>>>>> them into the <content>
>>>>>>> And I also processed them with the Unix "iconv" utility
>>>>>>> So I am pretty darn sure that there are no invalid chars in 

>>>>>>> there.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> -- Chris
>>>>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>>>>>
>>>>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require*
>>>>>>> the new
>>>>>>> version of woodstox.  If dropping down to an older version
>>>>>>> addresses the
>>>>>>> issue, then we can explore that as a solution.
>>>>>>>
>>>>>>> - James
>>>>>>>
>>>>>>> Chris Berry wrote:
>>>>>>> Hmmm.
>>>>>>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing
>>>>>>> the
>>>>>>> latest woodstox with Abdera
>>>>>>> Or more correctly, maven was bringing in some chained
>>>>>>> dependencies --
>>>>>>> one of which brought in woodstox 3.2.1.
>>>>>>> Abdera was using woodstox 2.0.5 at that time.
>>>>>>> The problem went away when I corrected this problem....
>>>>>>>
>>>>>>> Note, if this is your problem, you can workaround it with the
>>>>>>> maven
>>>>>>> <exclusions> element
>>>>>>> e.g.
>>>>>>>         <dependency>
>>>>>>>           <groupId>com.whatever</groupId>
>>>>>>>           <artifactId>foo</artifactId>
>>>>>>>           <version>1.2.3</version>
>>>>>>>           <exclusions>
>>>>>>>             <exclusion>
>>>>>>>               <groupId>org.codehaus.woodstox</groupId>
>>>>>>>               <artifactId>wstx-lgpl</artifactId>
>>>>>>>             </exclusion>
>>>>>>>           </exclusions>
>>>>>>>         </dependency>
>>>>>>>
>>>>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is
>>>>>>> related to
>>>>>>> the woodstox upgrade....
>>>>>>>
>>>>>>> Cheers,
>>>>>>> -- Chris
>>>>>>>
>>>>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Chris!
>>>>>>>
>>>>>>> Thanks for your feedback!
>>>>>>>
>>>>>>> This is exactly the bug I am seeing.
>>>>>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>>>>>> encoding="UTF-8"?>,
>>>>>>> Incidentally, my code worked fine before a recent "svn up" and
>>>>>>> it has
>>>>>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>>>>>
>>>>>>> If I understand your problem correctly, it occurs, if you  
>>>>>>> parse an
>>>>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"),
>>>>>>> right?
>>>>>>>
>>>>>>> Mine occurs, if I use an AbderaClient to create an entry on an
>>>>>>> external server, which is btw a proprietary closed-source-
>>>>>>> thingi. The
>>>>>>> server then gives me the error-message, while he tries to  
>>>>>>> parse my
>>>>>>> request.
>>>>>>>
>>>>>>> It seems that knowing that another person is seeing the issue
>>>>>>> confirms that the issue is on Abdera's side...
>>>>>>>
>>>>>>> I'm not sure, if we both encounter the same problem. My problem
>>>>>>> occurs
>>>>>>> also with the AbderaClient 0.22. Yours occured only after
>>>>>>> updating to
>>>>>>> 0.30-snapshot, right?
>>>>>>>
>>>>>>> I haven't the slightest idea, whether the problem lies in my
>>>>>>> code, in
>>>>>>> the abdera-code or even in the server-code.
>>>>>>>
>>>>>>> My next test would be the creation of an atom-entry by hand
>>>>>>> without
>>>>>>> the AbderaClient and provide an "<?xml version="1.0"
>>>>>>> encoding="UTF-8"?>" to check how the server reacts.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Herbert
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>>>>>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/
>>>>>>> freemail
>>>>>>>
>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dan Diephouse
>>>>>>> MuleSource
>>>>>>> http://mulesource.com | http://netzooid.com/blog
>>>>>>>
>>>>>>>
>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>
>>>>>>
>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>>
>>>>
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
>>
>
>
> -- 
> Stephen Duncan Jr
> www.stephenduncanjr.com

S'all good  ---   chriswberry at gmail dot com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message