Return-Path: Delivered-To: apmail-incubator-abdera-user-archive@locus.apache.org Received: (qmail 67236 invoked from network); 4 Sep 2007 19:19:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2007 19:19:44 -0000 Received: (qmail 18211 invoked by uid 500); 4 Sep 2007 19:19:38 -0000 Delivered-To: apmail-incubator-abdera-user-archive@incubator.apache.org Received: (qmail 18135 invoked by uid 500); 4 Sep 2007 19:19:38 -0000 Mailing-List: contact abdera-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: abdera-user@incubator.apache.org Delivered-To: mailing list abdera-user@incubator.apache.org Received: (qmail 18119 invoked by uid 99); 4 Sep 2007 19:19:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 12:19:38 -0700 X-ASF-Spam-Status: No, hits=1.0 required=10.0 tests=FB_WORD1_END_DOLLAR,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jasnell@gmail.com designates 64.233.184.229 as permitted sender) Received: from [64.233.184.229] (HELO wr-out-0506.google.com) (64.233.184.229) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 19:19:32 +0000 Received: by wr-out-0506.google.com with SMTP id c48so838029wra for ; Tue, 04 Sep 2007 12:19:11 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:x-enigmail-version:content-type:content-transfer-encoding; b=VP6ZYFGpifqU1ocBS88n0ZFkvg1WOx+dZ2UWu8UEoOZC0RzWp51Mm98I/nbyXbO5KqbCcrV8/Cc9FwqhzpYljhMoCc7+5tQ7qNMzgRTCYUs/dKivcHzt31Mbfr6zBNbCcvHhwvJS3XV46vJLDpD5ojNAHbBOwS5PP76AT8CSYZU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:x-enigmail-version:content-type:content-transfer-encoding; b=llUw1/p0xfIttcv9b8nrTX4D/JSQr+GpgqLh2cv/r6C5IU9efnnoUl6WZcNrh1sRLNKnyKtgYnKh2YZP+ehz4MNirPGqx+aK5PYwpC/Oz2grsT48AKaZfjakvtMIHeSjXW2WkrJTkY7SNj/RJo/95crPli7kJNYesLSC3mSrRZY= Received: by 10.90.36.3 with SMTP id j3mr5958148agj.1188933549966; Tue, 04 Sep 2007 12:19:09 -0700 (PDT) Received: from ?192.168.1.2? ( [67.181.218.96]) by mx.google.com with ESMTPS id g9sm5195577wra.2007.09.04.12.19.04 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 04 Sep 2007 12:19:05 -0700 (PDT) Message-ID: <46DDAFA8.2050501@gmail.com> Date: Tue, 04 Sep 2007 12:19:04 -0700 From: James M Snell User-Agent: Thunderbird 2.0.0.6 (X11/20070728) MIME-Version: 1.0 To: abdera-user@incubator.apache.org CC: Chris Berry Subject: Re: Invalid byte 2 of 3-byte UTF-8 sequence. References: <20070904122343.116970@gmx.net> <46C9E0E2-2002-4B19-B83C-1226C9D03AC7@gmail.com> <20070904135931.174410@gmx.net> <6EEAEA4C-1776-46A8-994E-A6A57F9983C6@gmail.com> <46DD6B06.5010208@gmail.com> <9043128E-0480-4BB1-AAEB-B74129A3E253@gmail.com> <46DD84DD.9@mulesource.com> <28338055-A15D-483E-B1B6-1BDF9D64D36D@gmail.com> <3396B30F-DA8B-4CAA-92AB-6090B66DA3B0@gmail.com> In-Reply-To: <3396B30F-DA8B-4CAA-92AB-6090B66DA3B0@gmail.com> X-Enigmail-Version: 0.95.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hmmm... well, I ran your test cases and have not been able to recreate the issue at all. I'm running on Ubuntu with the IBM JDK 1.5, tried the latest woodstox and the stax parser that ships with Websphere, and was completely unable to get the test to throw any kind of UTF-8 related errors. What operating system are you testing on? What JDK? - James Chris Berry wrote: > I added the following JUnit (to the JIRA), which I think proves that > woodstox 3.2.1 is not the issue. > It passes fine (no Exceptions thrown). > So (AFAICT) the issue is somewhere else (Abdera or Axiom??) > Cheers, > -- Chris > =================================== > package com.homeaway.hcdata.store.provider.blogs; > > import junit.framework.Test; > import junit.framework.TestCase; > import junit.framework.TestSuite; > > import javax.xml.stream.XMLStreamReader; > import javax.xml.stream.XMLInputFactory; > > import java.io.FileInputStream; > > import com.ctc.wstx.stax.WstxInputFactory; > > public class WoodstoxTest extends TestCase { > > private static final String userdir = System.getProperty( "user.dir" ); > > public static Test suite() > { return new TestSuite( WoodstoxTest.class ); } > > public void tearDown() throws Exception > { super.tearDown(); } > > public void setUp() throws Exception > { super.tearDown(); } > > public void testWoodstox() throws Exception { > > String filename = userdir + > "/var/blogs/cberry/99/9999/en/blog_9999.xml" ; > > // we sill simply walk the doc and see if it throws an Exception > XMLInputFactory xif = new WstxInputFactory(); > XMLStreamReader r = xif.createXMLStreamReader(new > FileInputStream( filename )); > while (r.hasNext()) r.next(); > } > } > > > On Sep 4, 2007, at 12:18 PM, Chris Berry wrote: > >> On Sep 4, 2007, at 11:16 AM, Dan Diephouse wrote: >> I have no idea whats causing this error, but I'm highly doubting its >> woodstox. Woodstox is the most highly conformant xml parser out there. >> (but I could be wrong) >> >> I would strongly suggest avoiding using 2.0.5 though for a number of >> reasons >> - 3.x has many stax conformance improvements. AXIOM hasn't really been >> tested with 2.x and it expects the stax api to react a certain way >> - 3.x is faster >> - 3.x has improved xml conformance >> >> I stepped through the test case a little and wasn't able to see what >> was going right away. I would need to get the AXIOM sources to really >> dig in more - I suspect the bug might lie in there after a little bit >> of digging, but that is because thats the place I haven't looked yet. >> >> Any chance you could catch the message being sent from the server with >> something like TCPMon and post it to the JIRA issue? >> >> - Dan >> >> Chris Berry wrote: >> That fixes it!!! >> >> I modified all of the pertinent POMs accordingly; >> I.e. >> >> >> woodstox >> wstx-asl >> 2.0.5 >> runtime >> >> 9 POMs were affected:: >> >> dogstar:~/java/abdera/svn-head-using-old-woostox/trunk cberry$ find . >> -name "*.xml" | xargs grep woodstox >> ./extensions/gdata/pom.xml: org.codehaus.woodstox >> ./extensions/geo/pom.xml: org.codehaus.woodstox >> ./extensions/json/pom.xml: org.codehaus.woodstox >> ./extensions/main/pom.xml: org.codehaus.woodstox >> ./extensions/media/pom.xml: org.codehaus.woodstox >> ./extensions/opensearch/pom.xml: >> org.codehaus.woodstox >> ./extensions/sharing/pom.xml: >> org.codehaus.woodstox >> ./parser/pom.xml: org.codehaus.woodstox >> ./pom.xml: org.codehaus.woodstox >> >> I will add this info to the JIRA. >> >> James, >> Can we move the SVN Head back to 2.0.5 until this is resolved?? >> >> FYI: we are using woodstox 3.2.1 in another project with these exact >> same XMLs without problem?? >> >> Thanks much, >> -- Chris >> On Sep 4, 2007, at 10:04 AM, Chris Berry wrote: >> >> I will try that. I didn't before, because I wasn't sure that the it >> wasn't required somehow internally... >> >> BTW: I ran these XML documents with the supposed invalid chars thru 2 >> different UTF-8 conversions as I read them from disk, before putting >> them into the >> And I also processed them with the Unix "iconv" utility >> So I am pretty darn sure that there are no invalid chars in there. >> >> Cheers, >> -- Chris >> On Sep 4, 2007, at 9:26 AM, James M Snell wrote: >> >> Well, FWIW, there are no changes in Abdera 0.3.0 that *require* the new >> version of woodstox. If dropping down to an older version addresses the >> issue, then we can explore that as a solution. >> >> - James >> >> Chris Berry wrote: >> Hmmm. >> FYI: I saw a similar problem with an earlier 0.3. I was mixing the >> latest woodstox with Abdera >> Or more correctly, maven was bringing in some chained dependencies -- >> one of which brought in woodstox 3.2.1. >> Abdera was using woodstox 2.0.5 at that time. >> The problem went away when I corrected this problem.... >> >> Note, if this is your problem, you can workaround it with the maven >> element >> e.g. >> >> com.whatever >> foo >> 1.2.3 >> >> >> org.codehaus.woodstox >> wstx-lgpl >> >> >> >> >> BTW: this is why I suspect that the Abdera 0.3 UTF-8 issue is related to >> the woodstox upgrade.... >> >> Cheers, >> -- Chris >> >> On Sep 4, 2007, at 8:59 AM, Iops@gmx.de wrote: >> >> Hi Chris! >> >> Thanks for your feedback! >> >> This is exactly the bug I am seeing. >> AFAICT, it is not related to a missing > encoding="UTF-8"?>, >> Incidentally, my code worked fine before a recent "svn up" and it has >> no , >> >> If I understand your problem correctly, it occurs, if you parse an >> entry with an AbderaClient (i.e. calling "entry.getContent()"), right? >> >> Mine occurs, if I use an AbderaClient to create an entry on an >> external server, which is btw a proprietary closed-source-thingi. The >> server then gives me the error-message, while he tries to parse my >> request. >> >> It seems that knowing that another person is seeing the issue >> confirms that the issue is on Abdera's side... >> >> I'm not sure, if we both encounter the same problem. My problem occurs >> also with the AbderaClient 0.22. Yours occured only after updating to >> 0.30-snapshot, right? >> >> I haven't the slightest idea, whether the problem lies in my code, in >> the abdera-code or even in the server-code. >> >> My next test would be the creation of an atom-entry by hand without >> the AbderaClient and provide an "> encoding="UTF-8"?>" to check how the server reacts. >> >> Regards, >> >> Herbert >> >> >> -- >> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. >> Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail >> >> S'all good --- chriswberry at gmail dot com >> >> >> >> >> >> S'all good --- chriswberry at gmail dot com >> >> >> >> >> S'all good --- chriswberry at gmail dot com >> >> >> >> >> >> >> -- >> Dan Diephouse >> MuleSource >> http://mulesource.com | http://netzooid.com/blog >> >> >> S'all good --- chriswberry at gmail dot com >> > > S'all good --- chriswberry at gmail dot com > > > >