abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Wed, 05 Sep 2007 03:29:58 GMT
Well, you've proven that the problem does not show up when using
FileInputStream with Woodstox.  Take a look at the code in FOMParser.
When you use the InputStream option, it actually ends up using a Reader.
 See if you can trace out what is going on in that mix to see if it's a
bug with the Reader that is being used.

- James

Chris Berry wrote:
> I wonder if this is because the default JVM encoding on linux (IIRC) is
> UTF-8
> But it is MacRoman on Macs ??
> 
> Note: changing "file.encoding" doesn't help. I've read that it is
> officially read-only...
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4163515
> (It doesn't fix the problem)
> 
> On Sep 4, 2007, at 7:54 PM, Chris Berry wrote:
> 
>> Darn!  I tried out JDK 1.6.0 on Mac OS-X 10.4 (Tiger) and it also
>> fails....
>> I don't know if there is an IBM JDK for Macs (I couldn't find one...)
>>
>> Not sure what to try next....
>> Thanks,
>> -- Chris
>> On Sep 4, 2007, at 5:08 PM, Stephen Duncan wrote:
>>
>>> I don't see the problem (i.e. I changed the one assertion you
>>> mentioned in
>>> the comments & commented out the abdera-extensions dependency that
>>> doesn't
>>> exist anymore, and the test passed).  I'm using:
>>>
>>> java version "1.6.0"
>>> Java(TM) SE Runtime Environment (build 1.6.0-b105)
>>> Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
>>>
>>> On Kubuntu.
>>>
>>> -Stephen
>>>
>>> On 9/4/07, Chris Berry <chriswberry@gmail.com> wrote:
>>>>
>>>> hmmmm.
>>>> The Sun vs IBM JDK is worth a try...
>>>>
>>>> On Sep 4, 2007, at 2:56 PM, James M Snell wrote:
>>>>
>>>>> Heh.. figures, one platform I can't test.  I can confirm that I am not
>>>>> seeing this error at all on Windows or Ubuntu using the IBM JDK and
>>>>> Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.
>>>>>
>>>>> - James
>>>>>
>>>>> Chris Berry wrote:
>>>>>> Macbook Pro -- MAC OS-X 10.3
>>>>>>
>>>>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
>>>>>> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May
23
>>>>>> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
>>>>>>
>>>>>> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -
>>>>>> version
>>>>>> java version "1.5.0_07"
>>>>>> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
>>>>>> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
>>>>>>
>>>>>> Thanks,
>>>>>> -- Chris
>>>>>>
>>>>>> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
>>>>>>
>>>>>>> Hmmm... well, I ran your test cases and have not been able to
>>>>>>> recreate
>>>>>>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5,
>>>>>>> tried the
>>>>>>> latest woodstox and the stax parser that ships with Websphere,
>>>>>>> and was
>>>>>>> completely unable to get the test to throw any kind of UTF-8
related
>>>>>>> errors.
>>>>>>>
>>>>>>> What operating system are you testing on?  What JDK?
>>>>>>>
>>>>>>> - James
>>>>>>>
>>>>>>> Chris Berry wrote:
>>>>>>>> I added the following JUnit (to the JIRA), which I think
proves
>>>>>>>> that
>>>>>>>> woodstox 3.2.1 is not the issue.
>>>>>>>> It passes fine (no Exceptions thrown).
>>>>>>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
>>>>>>>> Cheers,
>>>>>>>> -- Chris
>>>>>>>> ===================================
>>>>>>>> package com.homeaway.hcdata.store.provider.blogs;
>>>>>>>>
>>>>>>>> import junit.framework.Test;
>>>>>>>> import junit.framework.TestCase;
>>>>>>>> import junit.framework.TestSuite;
>>>>>>>>
>>>>>>>> import javax.xml.stream.XMLStreamReader;
>>>>>>>> import javax.xml.stream.XMLInputFactory;
>>>>>>>>
>>>>>>>> import java.io.FileInputStream;
>>>>>>>>
>>>>>>>> import com.ctc.wstx.stax.WstxInputFactory;
>>>>>>>>
>>>>>>>> public class WoodstoxTest extends TestCase {
>>>>>>>>
>>>>>>>>     private static final String userdir = System.getProperty(
>>>>>>>> "user.dir" );
>>>>>>>>
>>>>>>>>     public static Test suite()
>>>>>>>>     { return new TestSuite( WoodstoxTest.class ); }
>>>>>>>>
>>>>>>>>     public void tearDown() throws Exception
>>>>>>>>     { super.tearDown(); }
>>>>>>>>
>>>>>>>>     public void setUp() throws Exception
>>>>>>>>     { super.tearDown(); }
>>>>>>>>
>>>>>>>>     public void testWoodstox() throws Exception {
>>>>>>>>
>>>>>>>>         String filename = userdir +
>>>>>>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
>>>>>>>>
>>>>>>>>         // we sill simply walk the doc and see if it throws
an
>>>>>>>> Exception
>>>>>>>>         XMLInputFactory xif = new WstxInputFactory();
>>>>>>>>         XMLStreamReader r = xif.createXMLStreamReader(new
>>>>>>>> FileInputStream( filename ));
>>>>>>>>         while (r.hasNext()) r.next();
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
>>>>>>>>
>>>>>>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
>>>>>>>>> I have no idea whats causing this error, but I'm highly
>>>>>>>>> doubting its
>>>>>>>>> woodstox. Woodstox is the most highly conformant xml
parser out
>>>>>>>>> there.
>>>>>>>>> (but I could be wrong)
>>>>>>>>>
>>>>>>>>> I would strongly suggest avoiding using 2.0.5 though
for a
>>>>>>>>> number of
>>>>>>>>> reasons
>>>>>>>>> - 3.x has many stax conformance improvements. AXIOM hasn't
>>>>>>>>> really been
>>>>>>>>> tested with 2.x and it expects the stax api to react
a certain way
>>>>>>>>> - 3.x is faster
>>>>>>>>> - 3.x has improved xml conformance
>>>>>>>>>
>>>>>>>>> I stepped through the test case a little and wasn't able
to see
>>>>>>>>> what
>>>>>>>>> was going right away. I would need to get the AXIOM sources
to
>>>>>>>>> really
>>>>>>>>> dig in more - I suspect the bug might lie in there after
a
>>>>>>>>> little bit
>>>>>>>>> of digging, but that is because thats the place I haven't
>>>>>>>>> looked yet.
>>>>>>>>>
>>>>>>>>> Any chance you could catch the message being sent from
the
>>>>>>>>> server with
>>>>>>>>> something like TCPMon and post it to the JIRA issue?
>>>>>>>>>
>>>>>>>>> - Dan
>>>>>>>>>
>>>>>>>>> Chris Berry wrote:
>>>>>>>>> That fixes it!!!
>>>>>>>>>
>>>>>>>>> I modified all of the pertinent POMs accordingly;
>>>>>>>>> I.e.
>>>>>>>>> <!--
>>>>>>>>>       <dependency>
>>>>>>>>>         <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>>>>         <version>3.2.1</version>
>>>>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>>>> -->
>>>>>>>>>       <dependency>
>>>>>>>>>         <groupId>woodstox</groupId>
>>>>>>>>>         <artifactId>wstx-asl</artifactId>
>>>>>>>>>         <version>2.0.5</version>
>>>>>>>>>         <scope>runtime</scope>         </dependency>
>>>>>>>>>
>>>>>>>>> 9 POMs were affected::
>>>>>>>>>
>>>>>>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk
cberry$
>>>>>>>>> find .
>>>>>>>>> -name "*.xml" | xargs grep woodstox
>>>>>>>>> ./extensions/gdata/pom.xml:
>>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>>>> groupId>
>>>>>>>>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>>>> groupId>
>>>>>>>>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</
>>>>>>>>> groupId>
>>>>>>>>> ./extensions/media/pom.xml:
>>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>> ./extensions/opensearch/pom.xml:
>>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>> ./extensions/sharing/pom.xml:
>>>>>>>>> <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>>
>>>>>>>>> I will add this info to the JIRA.
>>>>>>>>>
>>>>>>>>> James,
>>>>>>>>> Can we move the SVN Head back to 2.0.5 until this is
resolved??
>>>>>>>>>
>>>>>>>>> FYI: we are using woodstox 3.2.1 in another project with
these
>>>>>>>>> exact
>>>>>>>>> same XMLs without problem??
>>>>>>>>>
>>>>>>>>> Thanks much,
>>>>>>>>> -- Chris
>>>>>>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
>>>>>>>>>
>>>>>>>>> I will try that. I didn't before, because I wasn't sure
that
>>>>>>>>> the it
>>>>>>>>> wasn't required somehow internally...
>>>>>>>>>
>>>>>>>>> BTW: I ran these XML documents with the supposed invalid
chars
>>>>>>>>> thru 2
>>>>>>>>> different UTF-8 conversions as I read them from disk,
before
>>>>>>>>> putting
>>>>>>>>> them into the <content>
>>>>>>>>> And I also processed them with the Unix "iconv" utility
>>>>>>>>> So I am pretty darn sure that there are no invalid chars
in there.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> -- Chris
>>>>>>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
>>>>>>>>>
>>>>>>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that
*require*
>>>>>>>>> the new
>>>>>>>>> version of woodstox.  If dropping down to an older version
>>>>>>>>> addresses the
>>>>>>>>> issue, then we can explore that as a solution.
>>>>>>>>>
>>>>>>>>> - James
>>>>>>>>>
>>>>>>>>> Chris Berry wrote:
>>>>>>>>> Hmmm.
>>>>>>>>> FYI:  I saw a similar problem with an earlier 0.3. I
was mixing
>>>>>>>>> the
>>>>>>>>> latest woodstox with Abdera
>>>>>>>>> Or more correctly, maven was bringing in some chained
>>>>>>>>> dependencies --
>>>>>>>>> one of which brought in woodstox 3.2.1.
>>>>>>>>> Abdera was using woodstox 2.0.5 at that time.
>>>>>>>>> The problem went away when I corrected this problem....
>>>>>>>>>
>>>>>>>>> Note, if this is your problem, you can workaround it
with the
>>>>>>>>> maven
>>>>>>>>> <exclusions> element
>>>>>>>>> e.g.
>>>>>>>>>         <dependency>
>>>>>>>>>           <groupId>com.whatever</groupId>
>>>>>>>>>           <artifactId>foo</artifactId>
>>>>>>>>>           <version>1.2.3</version>
>>>>>>>>>           <exclusions>
>>>>>>>>>             <exclusion>
>>>>>>>>>               <groupId>org.codehaus.woodstox</groupId>
>>>>>>>>>               <artifactId>wstx-lgpl</artifactId>
>>>>>>>>>             </exclusion>
>>>>>>>>>           </exclusions>
>>>>>>>>>         </dependency>
>>>>>>>>>
>>>>>>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8
issue is
>>>>>>>>> related to
>>>>>>>>> the woodstox upgrade....
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> -- Chris
>>>>>>>>>
>>>>>>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Chris!
>>>>>>>>>
>>>>>>>>> Thanks for your feedback!
>>>>>>>>>
>>>>>>>>> This is exactly the bug I am seeing.
>>>>>>>>> AFAICT, it is not related to a missing <?xml version="1.0"
>>>>>>>>> encoding="UTF-8"?>,
>>>>>>>>> Incidentally, my code worked fine before a recent "svn
up" and
>>>>>>>>> it has
>>>>>>>>> no <?xml version="1.0" encoding="UTF-8"?>,
>>>>>>>>>
>>>>>>>>> If I understand your problem correctly, it occurs, if
you parse an
>>>>>>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"),
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> Mine occurs, if I use an AbderaClient to create an entry
on an
>>>>>>>>> external server, which is btw a proprietary closed-source-
>>>>>>>>> thingi. The
>>>>>>>>> server then gives me the error-message, while he tries
to parse my
>>>>>>>>> request.
>>>>>>>>>
>>>>>>>>> It seems that knowing that another person is seeing the
issue
>>>>>>>>> confirms that the issue is on Abdera's side...
>>>>>>>>>
>>>>>>>>> I'm not sure, if we both encounter the same problem.
My problem
>>>>>>>>> occurs
>>>>>>>>> also with the AbderaClient 0.22. Yours occured only after
>>>>>>>>> updating to
>>>>>>>>> 0.30-snapshot, right?
>>>>>>>>>
>>>>>>>>> I haven't the slightest idea, whether the problem lies
in my
>>>>>>>>> code, in
>>>>>>>>> the abdera-code or even in the server-code.
>>>>>>>>>
>>>>>>>>> My next test would be the creation of an atom-entry by
hand
>>>>>>>>> without
>>>>>>>>> the AbderaClient and provide an "<?xml version="1.0"
>>>>>>>>> encoding="UTF-8"?>" to check how the server reacts.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Herbert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free
SMS.
>>>>>>>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/
>>>>>>>>> freemail
>>>>>>>>>
>>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Dan Diephouse
>>>>>>>>> MuleSource
>>>>>>>>> http://mulesource.com | http://netzooid.com/blog
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>>
>>>>>>>>
>>>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> S'all good  ---   chriswberry at gmail dot com
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> S'all good  ---   chriswberry at gmail dot com
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Stephen Duncan Jr
>>> www.stephenduncanjr.com
>>
>> S'all good  ---   chriswberry at gmail dot com
>>
>>
>>
> 
> S'all good  ---   chriswberry at gmail dot com
> 
> 
> 
> 

Mime
View raw message