Return-Path: Delivered-To: apmail-incubator-abdera-dev-archive@locus.apache.org Received: (qmail 77028 invoked from network); 4 Sep 2007 19:29:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2007 19:29:09 -0000 Received: (qmail 38131 invoked by uid 500); 4 Sep 2007 19:29:04 -0000 Delivered-To: apmail-incubator-abdera-dev-archive@incubator.apache.org Received: (qmail 38112 invoked by uid 500); 4 Sep 2007 19:29:04 -0000 Mailing-List: contact abdera-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: abdera-dev@incubator.apache.org Delivered-To: mailing list abdera-dev@incubator.apache.org Received: (qmail 38103 invoked by uid 99); 4 Sep 2007 19:29:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 12:29:04 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 19:30:22 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D77E071418F for ; Tue, 4 Sep 2007 12:28:45 -0700 (PDT) Message-ID: <18329062.1188934125878.JavaMail.jira@brutus> Date: Tue, 4 Sep 2007 12:28:45 -0700 (PDT) From: "Chris Berry (JIRA)" To: abdera-dev@incubator.apache.org Subject: [jira] Commented: (ABDERA-60) Invalid UTF-8 chars in the AbderaClient In-Reply-To: <26423230.1188876298234.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/ABDERA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524835 ] Chris Berry commented on ABDERA-60: ----------------------------------- Just to be positive. I have added code to the previous JUnit that actually retrieves text from the XML w/ woodstox. This is pretty unequivocal now... package com.homeaway.hcdata.store.provider.blogs; import junit.framework.Test; import junit.framework.TestCase; import junit.framework.TestSuite; import javax.xml.stream.XMLStreamReader; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamException; import java.io.FileInputStream; import com.ctc.wstx.stax.WstxInputFactory; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; /** */ public class WoodstoxTest extends TestCase { static private Log log = LogFactory.getLog( WoodstoxTest.class ); private static final String userdir = System.getProperty( "user.dir" ); private static final String filename = userdir + "/var/blogs/cberry/99/9999/en/blog_9999.xml" ; public static Test suite() { return new TestSuite( WoodstoxTest.class ); } public void tearDown() throws Exception { super.tearDown(); } public void setUp() throws Exception { super.tearDown(); } public void testWoodstox1() throws Exception { // we will simply walk the doc and see if it throws an Exception XMLInputFactory xif = new WstxInputFactory(); XMLStreamReader r = xif.createXMLStreamReader( new FileInputStream( filename ) ); while (r.hasNext()) r.next(); r.close(); } public void testWoodstox2() throws Exception { // we will simply walk the doc and see if it throws an Exception XMLInputFactory xif = new WstxInputFactory(); XMLStreamReader reader = xif.createXMLStreamReader( new FileInputStream( filename ) ); while ( reader.hasNext() ) { printEventInfo( reader ); } reader.close(); } private static void printEventInfo(XMLStreamReader reader) throws XMLStreamException { int eventCode = reader.next(); String val = null; switch (eventCode) { case 1 : val= reader.getLocalName(); log.debug("event = START_ELEMENT"); log.debug("Localname = "+val); break; case 2 : val= reader.getLocalName(); log.debug("event = END_ELEMENT"); log.debug("Localname = "+val); break; case 3 : val= reader.getPIData(); log.debug("event = PROCESSING_INSTRUCTION"); log.debug("PIData = " + val); break; case 4 : val= reader.getText(); log.debug("event = CHARACTERS"); log.debug("Characters = " + val); break; case 5 : val= reader.getText(); log.debug("event = COMMENT"); log.debug("Comment = " + val); break; case 6 : val= reader.getText(); log.debug("event = SPACE"); log.debug("Space = " + val); break; case 7 : log.debug("event = START_DOCUMENT"); log.debug("Document Started."); break; case 8 : log.debug("event = END_DOCUMENT"); log.debug("Document Ended"); break; case 9 : val= reader.getText(); log.debug("event = ENTITY_REFERENCE"); log.debug("Text = " + val); break; case 11 : val= reader.getText(); log.debug("event = DTD"); log.debug("DTD = " + val); break; case 12 : val= reader.getText(); log.debug("event = CDATA"); log.debug("CDATA = " + val); break; } } } > Invalid UTF-8 chars in the AbderaClient > --------------------------------------- > > Key: ABDERA-60 > URL: https://issues.apache.org/jira/browse/ABDERA-60 > Project: Abdera > Issue Type: Bug > Affects Versions: 0.3.0 > Environment: N/A > Reporter: Chris Berry > Fix For: 0.3.0 > > Attachments: abdera-utf8-bug.tar.gz > > > After upgrading to the latest 0.3-SNAPSHOT SVN trunk (on ~8/27/2007)) from a 0.3-SNAPSHOT download from a couple of months ago > And after making all required modifications (to catch up with all the API changes), I am seeing "Invalid UTF-8" > Note that these errors only occur in the AbderaClient when I call "entry.getContent()" > I have attached a small, self-contained JUnit test case which reproduces/demonstrates this issue. > It runs and builds out-of-the-box (using mvn install). > There is also a README.txt that details the output/issue > This JUnit reproduces the error. It is as small as I could get it. > My Atom Store is based on a Store and StoreProvider (based on code I received from Ugo Cei as a starting point) > Note that all of the code in src/main/java is relatively fixed between the latest 0.3-SNAPSHOT and the 0.3-SNAPSHOT that works > In other words, my code stayed as fixed as possible, and the latest 0.3-SNAPSHOT is the only real variable > I'm not saying that the bug isn't in my code, Only that it never showed up until my upgrade to 0.3-SNAPSHOT. > I actually suspect that it may be an issue w/ woodstox, which the latest 0.3-SNAPSHOT significantly upgrades. > Note: I have looked very closely at the XML file(s) that is causing this issue. > I used the Unix util; "iconv" on them. And AFAICT they do not contain improper UTF-8. > Chris Berry > chriswberry at gmail dot com -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.