abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Berry (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ABDERA-60) Invalid UTF-8 chars in the AbderaClient
Date Tue, 04 Sep 2007 19:28:45 GMT

    [ https://issues.apache.org/jira/browse/ABDERA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524835
] 

Chris Berry commented on ABDERA-60:
-----------------------------------

Just to be positive. I have added code to the previous JUnit that actually retrieves text
from the XML w/  woodstox.
This is pretty unequivocal now...

package com.homeaway.hcdata.store.provider.blogs;

import junit.framework.Test; 
import junit.framework.TestCase; 
import junit.framework.TestSuite;

import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;

import java.io.FileInputStream;

import com.ctc.wstx.stax.WstxInputFactory; 

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

/**
 */
public class WoodstoxTest extends TestCase {

    static private Log log = LogFactory.getLog( WoodstoxTest.class );

    private static final String userdir = System.getProperty( "user.dir" );
    private static final String filename =  userdir + "/var/blogs/cberry/99/9999/en/blog_9999.xml"
;
    
    public static Test suite() 
    { return new TestSuite( WoodstoxTest.class ); }

    public void tearDown() throws Exception 
    { super.tearDown(); } 

    public void setUp() throws Exception 
    { super.tearDown(); } 

    public void testWoodstox1() throws Exception {
        // we will simply walk the doc and see if it throws an Exception
        XMLInputFactory xif = new WstxInputFactory();
        XMLStreamReader r = xif.createXMLStreamReader( new FileInputStream( filename ) );
        while (r.hasNext()) r.next();
        r.close();
    }

    public void testWoodstox2() throws Exception {
        // we will simply walk the doc and see if it throws an Exception
        XMLInputFactory xif = new WstxInputFactory();
        XMLStreamReader reader = xif.createXMLStreamReader( new FileInputStream( filename
) );

        while ( reader.hasNext() ) {
            printEventInfo( reader );
        }
        reader.close();
    }

    private static void printEventInfo(XMLStreamReader reader) throws XMLStreamException {
        int eventCode = reader.next();
        String val = null;
        switch (eventCode) {
            case 1 :
                val= reader.getLocalName(); 
                log.debug("event = START_ELEMENT");
                log.debug("Localname = "+val);
                break;
            case 2 :
                val= reader.getLocalName(); 
                log.debug("event = END_ELEMENT");
                log.debug("Localname = "+val);
                break;
            case 3 :
                val= reader.getPIData();
                log.debug("event = PROCESSING_INSTRUCTION");
                log.debug("PIData = " + val);
                break;
            case 4 :
                val= reader.getText();
                log.debug("event = CHARACTERS");
                log.debug("Characters = " + val);
                break;
            case 5 :
                val= reader.getText();
                log.debug("event = COMMENT");
                log.debug("Comment = " + val);
                break;
            case 6 :
                val= reader.getText();
                log.debug("event = SPACE");
                log.debug("Space = " + val);
                break;
            case 7 :
                log.debug("event = START_DOCUMENT");
                log.debug("Document Started.");
                break;
            case 8 :
                log.debug("event = END_DOCUMENT");
                log.debug("Document Ended");
                break;
            case 9 :
                val= reader.getText();
                log.debug("event = ENTITY_REFERENCE");
                log.debug("Text = " + val);
                break;
            case 11 :
                val= reader.getText();
                log.debug("event = DTD");
                log.debug("DTD = " + val);

                break;
            case 12 :
                val= reader.getText();
                log.debug("event = CDATA");
                log.debug("CDATA = " + val);
                break;
        }
    }

}


> Invalid UTF-8 chars in the AbderaClient
> ---------------------------------------
>
>                 Key: ABDERA-60
>                 URL: https://issues.apache.org/jira/browse/ABDERA-60
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>         Environment: N/A
>            Reporter: Chris Berry
>             Fix For: 0.3.0
>
>         Attachments: abdera-utf8-bug.tar.gz
>
>
> After upgrading to the latest 0.3-SNAPSHOT SVN trunk (on ~8/27/2007)) from a 0.3-SNAPSHOT
download from a couple of months ago
> And after making all required modifications  (to catch up with all the API changes),
I am seeing "Invalid UTF-8"
> Note that these errors only occur in the AbderaClient when I call "entry.getContent()"

> I have attached a small, self-contained JUnit test case which reproduces/demonstrates
this issue.
> It runs and builds out-of-the-box (using mvn install).
> There is also a README.txt that details the output/issue
> This JUnit reproduces the error. It is as small as I could get it. 
> My Atom Store is based on a Store and StoreProvider (based on code I received from Ugo
Cei as a starting point)
> Note that all of the code in src/main/java is relatively fixed between the latest 0.3-SNAPSHOT
and the 0.3-SNAPSHOT that works 
> In other words, my code stayed as fixed as possible, and the latest 0.3-SNAPSHOT is the
only real variable
> I'm not saying that the bug isn't in my code, Only that it never showed up until my upgrade
to 0.3-SNAPSHOT.
> I actually suspect that it may be an issue w/ woodstox, which the latest 0.3-SNAPSHOT
significantly upgrades.
> Note: I have looked very closely at the XML file(s) that is causing this issue. 
> I used the Unix util; "iconv" on them. And AFAICT they do not contain improper UTF-8.
> Chris Berry
> chriswberry at gmail dot com

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message