abdera-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Duncan" <stephen.dun...@gmail.com>
Subject Re: Invalid byte 2 of 3-byte UTF-8 sequence.
Date Tue, 04 Sep 2007 22:08:52 GMT
I don't see the problem (i.e. I changed the one assertion you mentioned in
the comments & commented out the abdera-extensions dependency that doesn't
exist anymore, and the test passed).  I'm using:

java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)

On Kubuntu.

-Stephen

On 9/4/07, Chris Berry <chriswberry@gmail.com> wrote:
>
> hmmmm.
> The Sun vs IBM JDK is worth a try...
>
> On Sep 4, 2007, at 2:56 PM, James M Snell wrote:
>
> > Heh.. figures, one platform I can't test.  I can confirm that I am not
> > seeing this error at all on Windows or Ubuntu using the IBM JDK and
> > Woodstox or the WAS stax parser.  I haven't tried the Sun JDK yet.
> >
> > - James
> >
> > Chris Berry wrote:
> >> Macbook Pro -- MAC OS-X 10.3
> >>
> >> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ uname -a
> >> Darwin dogstar.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
> >> 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
> >>
> >> dogstar:~/homeaway/pstore/working-NewAbdera-test cberry$ java -
> >> version
> >> java version "1.5.0_07"
> >> Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-164)
> >> Java HotSpot(TM) Client VM (build 1.5.0_07-87, mixed mode, sharing)
> >>
> >> Thanks,
> >> -- Chris
> >>
> >> On Sep 4, 2007, at 2:19 PM, James M Snell wrote:
> >>
> >>> Hmmm... well, I ran your test cases and have not been able to
> >>> recreate
> >>> the issue at all.  I'm running on Ubuntu with the IBM JDK 1.5,
> >>> tried the
> >>> latest woodstox and the stax parser that ships with Websphere,
> >>> and was
> >>> completely unable to get the test to throw any kind of UTF-8 related
> >>> errors.
> >>>
> >>> What operating system are you testing on?  What JDK?
> >>>
> >>> - James
> >>>
> >>> Chris Berry wrote:
> >>>> I added the following JUnit (to the JIRA), which I think proves
> >>>> that
> >>>> woodstox 3.2.1 is not the issue.
> >>>> It passes fine (no Exceptions thrown).
> >>>> So (AFAICT) the issue is somewhere else (Abdera or Axiom??)
> >>>> Cheers,
> >>>> -- Chris
> >>>> ===================================
> >>>> package com.homeaway.hcdata.store.provider.blogs;
> >>>>
> >>>> import junit.framework.Test;
> >>>> import junit.framework.TestCase;
> >>>> import junit.framework.TestSuite;
> >>>>
> >>>> import javax.xml.stream.XMLStreamReader;
> >>>> import javax.xml.stream.XMLInputFactory;
> >>>>
> >>>> import java.io.FileInputStream;
> >>>>
> >>>> import com.ctc.wstx.stax.WstxInputFactory;
> >>>>
> >>>> public class WoodstoxTest extends TestCase {
> >>>>
> >>>>     private static final String userdir = System.getProperty(
> >>>> "user.dir" );
> >>>>
> >>>>     public static Test suite()
> >>>>     { return new TestSuite( WoodstoxTest.class ); }
> >>>>
> >>>>     public void tearDown() throws Exception
> >>>>     { super.tearDown(); }
> >>>>
> >>>>     public void setUp() throws Exception
> >>>>     { super.tearDown(); }
> >>>>
> >>>>     public void testWoodstox() throws Exception {
> >>>>
> >>>>         String filename = userdir +
> >>>> "/var/blogs/cberry/99/9999/en/blog_9999.xml" ;
> >>>>
> >>>>         // we sill simply walk the doc and see if it throws an
> >>>> Exception
> >>>>         XMLInputFactory xif = new WstxInputFactory();
> >>>>         XMLStreamReader r = xif.createXMLStreamReader(new
> >>>> FileInputStream( filename ));
> >>>>         while (r.hasNext()) r.next();
> >>>>     }
> >>>> }
> >>>>
> >>>>
> >>>> On Sep 4, 2007, at 12:18 PM, Chris Berry wrote:
> >>>>
> >>>>> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote:
> >>>>> I have no idea whats causing this error, but I'm highly
> >>>>> doubting its
> >>>>> woodstox. Woodstox is the most highly conformant xml parser out
> >>>>> there.
> >>>>> (but I could be wrong)
> >>>>>
> >>>>> I would strongly suggest avoiding using 2.0.5 though for a
> >>>>> number of
> >>>>> reasons
> >>>>> - 3.x has many stax conformance improvements. AXIOM hasn't
> >>>>> really been
> >>>>> tested with 2.x and it expects the stax api to react a certain way
> >>>>> - 3.x is faster
> >>>>> - 3.x has improved xml conformance
> >>>>>
> >>>>> I stepped through the test case a little and wasn't able to see
> >>>>> what
> >>>>> was going right away. I would need to get the AXIOM sources to
> >>>>> really
> >>>>> dig in more - I suspect the bug might lie in there after a
> >>>>> little bit
> >>>>> of digging, but that is because thats the place I haven't
> >>>>> looked yet.
> >>>>>
> >>>>> Any chance you could catch the message being sent from the
> >>>>> server with
> >>>>> something like TCPMon and post it to the JIRA issue?
> >>>>>
> >>>>> - Dan
> >>>>>
> >>>>> Chris Berry wrote:
> >>>>> That fixes it!!!
> >>>>>
> >>>>> I modified all of the pertinent POMs accordingly;
> >>>>> I.e.
> >>>>> <!--
> >>>>>       <dependency>
> >>>>>         <groupId>org.codehaus.woodstox</groupId>
> >>>>>         <artifactId>wstx-asl</artifactId>
> >>>>>         <version>3.2.1</version>
> >>>>>         <scope>runtime</scope>         </dependency>
> >>>>> -->
> >>>>>       <dependency>
> >>>>>         <groupId>woodstox</groupId>
> >>>>>         <artifactId>wstx-asl</artifactId>
> >>>>>         <version>2.0.5</version>
> >>>>>         <scope>runtime</scope>         </dependency>
> >>>>>
> >>>>> 9 POMs were affected::
> >>>>>
> >>>>> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$
> >>>>> find .
> >>>>> -name "*.xml" | xargs grep woodstox
> >>>>> ./extensions/gdata/pom.xml:
> >>>>> <groupId>org.codehaus.woodstox</groupId>
> >>>>> ./extensions/geo/pom.xml:      <groupId>org.codehaus.woodstox</
> >>>>> groupId>
> >>>>> ./extensions/json/pom.xml:      <groupId>org.codehaus.woodstox</
> >>>>> groupId>
> >>>>> ./extensions/main/pom.xml:      <groupId>org.codehaus.woodstox</
> >>>>> groupId>
> >>>>> ./extensions/media/pom.xml:
> >>>>> <groupId>org.codehaus.woodstox</groupId>
> >>>>> ./extensions/opensearch/pom.xml:
> >>>>> <groupId>org.codehaus.woodstox</groupId>
> >>>>> ./extensions/sharing/pom.xml:
> >>>>> <groupId>org.codehaus.woodstox</groupId>
> >>>>> ./parser/pom.xml:      <groupId>org.codehaus.woodstox</groupId>
> >>>>> ./pom.xml:        <groupId>org.codehaus.woodstox</groupId>
> >>>>>
> >>>>> I will add this info to the JIRA.
> >>>>>
> >>>>> James,
> >>>>> Can we move the SVN Head back to 2.0.5 until this is resolved??
> >>>>>
> >>>>> FYI: we are using woodstox 3.2.1 in another project with these
> >>>>> exact
> >>>>> same XMLs without problem??
> >>>>>
> >>>>> Thanks much,
> >>>>> -- Chris
> >>>>> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote:
> >>>>>
> >>>>> I will try that. I didn't before, because I wasn't sure that
> >>>>> the it
> >>>>> wasn't required somehow internally...
> >>>>>
> >>>>> BTW: I ran these XML documents with the supposed invalid chars
> >>>>> thru 2
> >>>>> different UTF-8 conversions as I read them from disk, before
> >>>>> putting
> >>>>> them into the <content>
> >>>>> And I also processed them with the Unix "iconv" utility
> >>>>> So I am pretty darn sure that there are no invalid chars in there.
> >>>>>
> >>>>> Cheers,
> >>>>> -- Chris
> >>>>> On Sep 4, 2007, at 9:26 AM, James M Snell wrote:
> >>>>>
> >>>>> Well, FWIW, there are no changes in Abdera 0.3.0 that *require*
> >>>>> the new
> >>>>> version of woodstox.  If dropping down to an older version
> >>>>> addresses the
> >>>>> issue, then we can explore that as a solution.
> >>>>>
> >>>>> - James
> >>>>>
> >>>>> Chris Berry wrote:
> >>>>> Hmmm.
> >>>>> FYI:  I saw a similar problem with an earlier 0.3. I was mixing
> >>>>> the
> >>>>> latest woodstox with Abdera
> >>>>> Or more correctly, maven was bringing in some chained
> >>>>> dependencies --
> >>>>> one of which brought in woodstox 3.2.1.
> >>>>> Abdera was using woodstox 2.0.5 at that time.
> >>>>> The problem went away when I corrected this problem....
> >>>>>
> >>>>> Note, if this is your problem, you can workaround it with the
> >>>>> maven
> >>>>> <exclusions> element
> >>>>> e.g.
> >>>>>         <dependency>
> >>>>>           <groupId>com.whatever</groupId>
> >>>>>           <artifactId>foo</artifactId>
> >>>>>           <version>1.2.3</version>
> >>>>>           <exclusions>
> >>>>>             <exclusion>
> >>>>>               <groupId>org.codehaus.woodstox</groupId>
> >>>>>               <artifactId>wstx-lgpl</artifactId>
> >>>>>             </exclusion>
> >>>>>           </exclusions>
> >>>>>         </dependency>
> >>>>>
> >>>>> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is
> >>>>> related to
> >>>>> the woodstox upgrade....
> >>>>>
> >>>>> Cheers,
> >>>>> -- Chris
> >>>>>
> >>>>> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de <mailto:Iops@gmx.de>
> >>>>> wrote:
> >>>>>
> >>>>> Hi Chris!
> >>>>>
> >>>>> Thanks for your feedback!
> >>>>>
> >>>>> This is exactly the bug I am seeing.
> >>>>> AFAICT, it is not related to a missing <?xml version="1.0"
> >>>>> encoding="UTF-8"?>,
> >>>>> Incidentally, my code worked fine before a recent "svn up" and
> >>>>> it has
> >>>>> no <?xml version="1.0" encoding="UTF-8"?>,
> >>>>>
> >>>>> If I understand your problem correctly, it occurs, if you parse
an
> >>>>> entry with an AbderaClient (i.e. calling "entry.getContent()"),
> >>>>> right?
> >>>>>
> >>>>> Mine occurs, if I use an AbderaClient to create an entry on an
> >>>>> external server, which is btw a proprietary closed-source-
> >>>>> thingi. The
> >>>>> server then gives me the error-message, while he tries to parse
my
> >>>>> request.
> >>>>>
> >>>>> It seems that knowing that another person is seeing the issue
> >>>>> confirms that the issue is on Abdera's side...
> >>>>>
> >>>>> I'm not sure, if we both encounter the same problem. My problem
> >>>>> occurs
> >>>>> also with the AbderaClient 0.22. Yours occured only after
> >>>>> updating to
> >>>>> 0.30-snapshot, right?
> >>>>>
> >>>>> I haven't the slightest idea, whether the problem lies in my
> >>>>> code, in
> >>>>> the abdera-code or even in the server-code.
> >>>>>
> >>>>> My next test would be the creation of an atom-entry by hand
> >>>>> without
> >>>>> the AbderaClient and provide an "<?xml version="1.0"
> >>>>> encoding="UTF-8"?>" to check how the server reacts.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Herbert
> >>>>>
> >>>>>
> >>>>> --
> >>>>> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> >>>>> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/
> >>>>> freemail
> >>>>>
> >>>>> S'all good  ---   chriswberry at gmail dot com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> S'all good  ---   chriswberry at gmail dot com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> S'all good  ---   chriswberry at gmail dot com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Dan Diephouse
> >>>>> MuleSource
> >>>>> http://mulesource.com | http://netzooid.com/blog
> >>>>>
> >>>>>
> >>>>> S'all good  ---   chriswberry at gmail dot com
> >>>>>
> >>>>
> >>>> S'all good  ---   chriswberry at gmail dot com
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >> S'all good  ---   chriswberry at gmail dot com
> >>
> >>
> >>
>
> S'all good  ---   chriswberry at gmail dot com
>
>
>
>


-- 
Stephen Duncan Jr
www.stephenduncanjr.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message