abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Berry <chriswbe...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Wed, 05 Sep 2007 03:23:21 GMT
I wonder if this is because the default JVM encoding on linux (IIRC)  
is UTF-8
But it is MacRoman on Macs ??

Note: changing "file.encoding" doesn't help. I've read that it is  
officially read-only...
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4163515
(It doesn't fix the problem)

On Sep 4, 2007, at 7:54 PM, Chris Berry wrote:

> Darn!  I tried out JDK 1.6.0 on Mac OS-X 10.4 (Tiger) and it also  
> fails....
> I don't know if there is an IBM JDK for Macs (I couldn't find one...)
>
> Not sure what to try next....
> Thanks,
> -- Chris 
>
> On Sep 4, 2007, at 5:08 PM, Stephen Duncan wrote:
>
>> I don't see the problem (i.e. I changed the one assertion you  
>> mentioned in
>> the comments & commented out the abdera-extensions dependency that  
>> doesn't
>> exist anymore, and the test passed).  I'm using:
>>
>> java version "1.6.0"
>> Java(TM) SE Runtime Environment (build 1.6.0-b105)
>> Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
>>
>> On Kubuntu.
>>
>> -Stephen
>>
>> On 9/4/07, Chris Berry <chriswberry@gmail.com> wrote:
>>>
>>> hmmmm.
>>> The Sun vs IBM JDK is worth a try...
>>>
>>> On Sep 4, 2007, at 2:56 PM, James M Snell wrote:
>>>
>>>> Heh.. figures, one platform I can't test.  I can confirm that I  
>>>> am not
>>>> seeing this error at all on Windows or Ubuntu using the IBM JDK and
>>>> Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.
>>>>
>>>> - James
>>>>
>>>> Chris Berry wrote:
>>>>> Macbook Pro -- MAC OS-X 10.3
>>>>>
>>>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
>>>>> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed  
>>>>> May 23
>>>>> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
>>>>>
>>>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -
>>>>> version
>>>>> java version "1.5.0_07"
>>>>> Java(TM) 2 Runtime Environment, Standard Edition (build  
>>>>> 1.5.0_07-164)
>>>>> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode,  
>>>>> sharing)
>>>>>
>>>>> Thanks,
>>>>> -- Chris
>>>>>
>>>>> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
>>>>>
>>>>>> Hmmm... well, I ran your test cases and have not been able to
>>>>>> recreate
>>>>>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5,
>>>>>> tried the
>>>>>> latest woodstox and the stax parser that ships with Websphere,
>>>>>> and was
>>>>>> completely unable to get the test to throw any kind of UTF-8  
>>>>>> related
>>>>>> errors.
>>>>>>
>>>>>> What operating system are you testing on?  What JDK?
>>>>>>
>>>>>> - James
>>>>>>
>>>>>> Chris Berry wrote:
>>>>>>> I added the following JUnit (to the JIRA), which I think proves
>>>>>>> that
>>>>>>> woodstox 3.2.1 is not the issue.
>>>>>>> It passes fine (no Exceptions thrown).
>>>>>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>>>>>>> Cheers,
>>>>>>> -- Chris
>>>>>>> ===================================
>>>>>>> package com.homeaway.hcdata.store.provider.blogs;
>>>>>>>
>>>>>>> import junit.framework.Test;
>>>>>>> import junit.framework.TestCase;
>>>>>>> import junit.framework.TestSuite;
>>>>>>>
>>>>>>> import javax.xml.stream.XMLStreamReader;
>>>>>>> import javax.xml.stream.XMLInputFactory;
>>>>>>>
>>>>>>> import java.io.FileInputStream;
>>>>>>>
>>>>>>> import com.ctc.wstx.stax.WstxInputFactory;
>>>>>>>
>>>>>>> public class WoodstoxTest extends TestCase {
>>>>>>>
>>>>>>>     private static final String userdir = System.getProperty(
>>>>>>> "user.dir" );
>>>>>>>
>>>>>>>     public static Test suite()
>>>>>>>     { return new TestSuite( WoodstoxTest.class ); }
>>>>>>>
>>>>>>>     public void tearDown() throws Exception
>>>>>>>     { super.tearDown(); }
>>>>>>>
>>>>>>>     public void setUp() throws Exception
>>>>>>>     { super.tearDown(); }
>>>>>>>
>>>>>>>     public void testWoodstox() throws Exception {
>>>>>>>
>>>>>>>         String filename = userdir +
>>>>>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>>>>>>
>>>>>>>         // we sill simply walk the doc and see if it throws an
>>>>>>> Exception
>>>>>>>         XMLInputFactory xif = new WstxInputFactory();
>>>>>>>         XMLStreamReader r = xif.createXMLStreamReader(new
>>>>>>> FileInputStream( filename ));
>>>>>>>         while (r.hasNext()) r.next();
>>>>>>>     }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>>>>>>
>>>>>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>>>>>>> I have no idea whats causing this error, but I'm highly
>>>>>>>> doubting its
>>>>>>>> woodstox. Woodstox is the most highly conformant xml parser
out
>>>>>>>> there.
>>>>>>>> (but I could be wrong)
>>>>>>>>
>>>>>>>> I would strongly suggest avoiding using 2.0.5 though for
a
>>>>>>>> number of
>>>>>>>> reasons
>>>>>>>> - 3.x has many stax conformance improvements. AXIOM hasn't
>>>>>>>> really been
>>>>>>>> tested with 2.x and it expects the stax api to react a  
>>>>>>>> certain way
>>>>>>>> - 3.x is faster
>>>>>>>> - 3.x has improved xml conformance
>>>>>>>>
>>>>>>>> I stepped through the test case a little and wasn't able
to see
>>>>>>>> what
>>>>>>>> was going right away. I would need to get the AXIOM sources
to
>>>>>>>> really
>>>>>>>> dig in more - I suspect the bug might lie in there after
a
>>>>>>>> little bit
>>>>>>>> of digging, but that is because thats the place I haven't
>>>>>>>> looked yet.
>>>>>>>>
>>>>>>>> Any chance you could catch the message being sent from the
>>>>>>>> server with
>>>>>>>> something like TCPMon and post it to the JIRA issue?
>>>>>>>>
>>>>>>>> - Dan
>>>>>>>>
>>>>>>>> Chris Berry wrote:
>>>>>>>> That fixes it!!!
>>>>>>>>
>>>>>>>> I modified all of the pertinent POMs accordingly;
>>>>>>>> I.e.
>>>>>>>> <!--
>>>>>>>>       <dependency>
>>>>>>>>         <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>>>         <version>3.2.1</version>
>>>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>>> -->
>>>>>>>>       <dependency>
>>>>>>>>         <groupId>woodstox</groupId>
>>>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>>>         <version>2.0.5</version>
>>>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>>>
>>>>>>>> 9 POMs were affected::
>>>>>>>>
>>>>>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$
>>>>>>>> find .
>>>>>>>> -name "*.xml" | xargs grep woodstox
>>>>>>>> ./extensions/gdata/pom.xml:
>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>>> groupId>
>>>>>>>> ./extensions/json/pom.xml:       
>>>>>>>> <groupId>org.codehaus.woodstox</
>>>>>>>> groupId>
>>>>>>>> ./extensions/main/pom.xml:       
>>>>>>>> <groupId>org.codehaus.woodstox</
>>>>>>>> groupId>
>>>>>>>> ./extensions/media/pom.xml:
>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>> ./extensions/opensearch/pom.xml:
>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>> ./extensions/sharing/pom.xml:
>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>>>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>
>>>>>>>> I will add this info to the JIRA.
>>>>>>>>
>>>>>>>> James,
>>>>>>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
>>>>>>>>
>>>>>>>> FYI: we are using woodstox 3.2.1 in another project with
these
>>>>>>>> exact
>>>>>>>> same XMLs without problem??
>>>>>>>>
>>>>>>>> Thanks much,
>>>>>>>> -- Chris
>>>>>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>>>>>>
>>>>>>>> I will try that. I didn't before, because I wasn't sure that
>>>>>>>> the it
>>>>>>>> wasn't required somehow internally...
>>>>>>>>
>>>>>>>> BTW: I ran these XML documents with the supposed invalid
chars
>>>>>>>> thru 2
>>>>>>>> different UTF-8 conversions as I read them from disk, before
>>>>>>>> putting
>>>>>>>> them into the <content>
>>>>>>>> And I also processed them with the Unix "iconv" utility
>>>>>>>> So I am pretty darn sure that there are no invalid chars
in  
>>>>>>>> there.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> -- Chris
>>>>>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>>>>>>
>>>>>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require*
>>>>>>>> the new
>>>>>>>> version of woodstox.  If dropping down to an older version
>>>>>>>> addresses the
>>>>>>>> issue, then we can explore that as a solution.
>>>>>>>>
>>>>>>>> - James
>>>>>>>>
>>>>>>>> Chris Berry wrote:
>>>>>>>> Hmmm.
>>>>>>>> FYI:  I saw a similar problem with an earlier 0.3. I was
mixing
>>>>>>>> the
>>>>>>>> latest woodstox with Abdera
>>>>>>>> Or more correctly, maven was bringing in some chained
>>>>>>>> dependencies --
>>>>>>>> one of which brought in woodstox 3.2.1.
>>>>>>>> Abdera was using woodstox 2.0.5 at that time.
>>>>>>>> The problem went away when I corrected this problem....
>>>>>>>>
>>>>>>>> Note, if this is your problem, you can workaround it with
the
>>>>>>>> maven
>>>>>>>> <exclusions> element
>>>>>>>> e.g.
>>>>>>>>         <dependency>
>>>>>>>>           <groupId>com.whatever</groupId>
>>>>>>>>           <artifactId>foo</artifactId>
>>>>>>>>           <version>1.2.3</version>
>>>>>>>>           <exclusions>
>>>>>>>>             <exclusion>
>>>>>>>>               <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>               <artifactId>wstx-lgpl</artifactId>
>>>>>>>>             </exclusion>
>>>>>>>>           </exclusions>
>>>>>>>>         </dependency>
>>>>>>>>
>>>>>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue
is
>>>>>>>> related to
>>>>>>>> the woodstox upgrade....
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> -- Chris
>>>>>>>>
>>>>>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Chris!
>>>>>>>>
>>>>>>>> Thanks for your feedback!
>>>>>>>>
>>>>>>>> This is exactly the bug I am seeing.
>>>>>>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>>>>>>> encoding="UTF-8"?>,
>>>>>>>> Incidentally, my code worked fine before a recent "svn up"
and
>>>>>>>> it has
>>>>>>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>>>>>>
>>>>>>>> If I understand your problem correctly, it occurs, if you
 
>>>>>>>> parse an
>>>>>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"),
>>>>>>>> right?
>>>>>>>>
>>>>>>>> Mine occurs, if I use an AbderaClient to create an entry
on an
>>>>>>>> external server, which is btw a proprietary closed-source-
>>>>>>>> thingi. The
>>>>>>>> server then gives me the error-message, while he tries to
 
>>>>>>>> parse my
>>>>>>>> request.
>>>>>>>>
>>>>>>>> It seems that knowing that another person is seeing the issue
>>>>>>>> confirms that the issue is on Abdera's side...
>>>>>>>>
>>>>>>>> I'm not sure, if we both encounter the same problem. My problem
>>>>>>>> occurs
>>>>>>>> also with the AbderaClient 0.22. Yours occured only after
>>>>>>>> updating to
>>>>>>>> 0.30-snapshot, right?
>>>>>>>>
>>>>>>>> I haven't the slightest idea, whether the problem lies in
my
>>>>>>>> code, in
>>>>>>>> the abdera-code or even in the server-code.
>>>>>>>>
>>>>>>>> My next test would be the creation of an atom-entry by hand
>>>>>>>> without
>>>>>>>> the AbderaClient and provide an "<?xml version="1.0"
>>>>>>>> encoding="UTF-8"?>" to check how the server reacts.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Herbert
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
>>>>>>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/
>>>>>>>> freemail
>>>>>>>>
>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Dan Diephouse
>>>>>>>> MuleSource
>>>>>>>> http://mulesource.com | http://netzooid.com/blog
>>>>>>>>
>>>>>>>>
>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>
>>>>>>>
>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>
>>>>>
>>>>>
>>>
>>> S'all good  ---   chriswberry at gmail dot com
>>>
>>>
>>>
>>>
>>
>>
>> -- 
>> Stephen Duncan Jr
>> www.stephenduncanjr.com
>
> S'all good  ---   chriswberry at gmail dot com
>
>
>

S'all good  ---   chriswberry at gmail dot com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message