xerces-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Pelton" <...@PKC.com>
Subject RE: "Invalid Document Structure" error during parsing
Date Mon, 01 Nov 2004 14:23:00 GMT
I think you're on to something when you inspect the document contents.
However, I'd consider three changes to make the process more robust, and
there are a couple of simple command-line tests you can run to try to
discover whether your input is being altered before your program sees
it. (I think this is virtually certain.)
 
The first test program alteration, and probably the most important, is
to hex dump your input rather than using printf(). printf may not show
important differences in the input.
 
Second, make sure you're dumping the entire document. If the problem
occurs late in a file bigger than 5000 bytes, you won't see it as the
program is currently written.
 
Finally, consider altering the test program to dump the input regardless
of where the input comes from. If the output differs when the input
comes from stdin versus the file, either cat/type has altered the
stream, or the standard input has.
 
The first command line test is to compare the results of "cat sim.xml |
hexdump" and "hexdump sim.xml". (I'm assuming your Linux box has
hexdump.) It seems likely to me that they will differ, in which case
either cat or the pipe has altered the stream. Smart money would be on
cat being the culprit, though I don't know why it would change the
stream.
 
The second command line test is "./test < sim.xml", which will stream
sim.xml directly to your test program's standard input (rather than
allowing cat a chance to alter it). If this works, cat is almost
certainly guilty of transforming the file.


________________________________

	From: Aditya Kulkarni [mailto:aditya.kulkarni@veritas.com] 
	Sent: Monday, November 01, 2004 5:01 AM
	To: xerces-c-dev@xml.apache.org
	Subject: "Invalid Document Structure" error during parsing
	
	

	Hi all,

	 

	I am getting the "Invalid document structure" error during
parsing of an XML file. The scenario is as follows -

	 

	[1] Platform: Redhat Linux 9.0 and Windows 32-bit

	 

	[2] Non multi - threaded

	 

	[3] I am passing an instance of  StdInInputSource class as an
argument to the XercesDomParser::parse() method.

	 

	[4] My test program is being invoked in this way :  cat sim.xml
| ./test ( type sim.xml | test on win 32)

	 

	[5] When I run my program with the same xml file (invoked as
./test sim.xml ) directly, it gives the desired output. Here I have
passed an instance of the LocalFileInputSource class as the argument to
XercesDomParser::parse() method.

	 

	What can be the possible cause of this message?

	 

	Here is the snip of code that I am talking about.

	<snip>

	Void XMLParser :: parse( void )

	    char            whoami[] = "XMLParser::parse";

	    XMLCh           *x_file_name = NULL;

	 

	    // [0]

	    XMLPlatformUtils::Initialize();

	 

	    // [1]

	    m_parser = new XercesDOMParser();

	    NULLCHECK(m_parser, whoami, "Failed to create
XercesDOMParser instance", XML_PARSER_NOMEM);

	 

	    // [2]

	    m_parser->setDoNamespaces(true);

	    m_parser->setValidationScheme(XercesDOMParser::Val_Always);

	    m_parser->setDoSchema(true);

	    m_parser->setCreateEntityReferenceNodes(false);

	    m_parser->setCreateCommentNodes(false);

	    m_parser->setIncludeIgnorableWhitespace(false);

	 

	    m_ehandler = (ErrorHandler *)new HandlerBase();

	    if (NULL != m_ehandler) {

	        m_parser->setErrorHandler(m_ehandler);

	    }

	 

	    // create input source.

	    if (m_readFromStdin == true) {

	        m_inputSource = new StdInInputSource();

	        NULLCHECK(m_inputSource, whoami, "StdInInputSource
failed", XML_PARSER_NOMEM);

	    } else {

	        x_file_name = XMLString::transcode(m_xmlFile.c_str());

	        NULLCHECK(x_file_name, whoami, "XMLString::transcode
failed", XML_PARSER_FAILED);

	        m_inputSource = new LocalFileInputSource((const XMLCh
*)x_file_name);

	        NULLCHECK(m_inputSource, whoami, "LocalFileInputSource
failed", XML_PARSER_NOMEM);

	    }

	 

	    // [3]

	    try {

	        m_parser->parse((const InputSource &)*m_inputSource);

	        //        m_parser->parse((const char
*)m_xmlFile.c_str());

	    } catch (const XMLException &x) {

	        handleXMLException(x);

	    } catch (const DOMException &d) {

	        handleDOMException(d);

	    } catch (const SAXParseException &sp) {

	        handleSAXParseException(sp);

	    } catch (const SAXException &s) {

	        handleSAXException(s);

	    }

	 

	    if (m_parser->getErrorCount() != 0) {

	        printf("Errors encountered during parsing file %s\n",
m_xmlFile.c_str());

	        return XML_PARSER_FAILED;

	    }  

	</snip>

	 

	I also tried peeping at the contents that come to the test
program from the StdInInputSource. The input that I get from
StdInInputSource is exactly similar to what is present in the file. Here
is the code to see what comes from the StdInInputSource.

	 

	<snip>

	    // check if input stream is correct

	    if (m_readFromStdin == true) {

	        bin_in_stream = m_inputSource->makeStream();

	        if (NULL == bin_in_stream) {

	            printf("bin_in_stream == NULL\n");

	            return XML_PARSER_FAILED;

	        }

	        memset(inbuf, 0, 5000);

	        bin_in_stream->readBytes((XMLByte *const)inbuf, 4999);

	        printf("%s\n", inbuf);

	    }

	</snip>

	 

	Thanks,

	Aditya Kulkarni


Mime
View raw message