[ https://issues.apache.org/jira/browse/STDCXX-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jojo Jose updated STDCXX-1053:
------------------------------
Description:
Hi All,
Please let me know, if anybody can provide some clue on this.
I have been using Xerces as XML parser in my C++ application and I have recently migrated
my Xerces version from 1.3 (very old) to 3.1.
After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource & source={...})
and passing a Unicode file as input, it pops up exception. However the same works ok for ANSI.
The call stack is as shown below.
xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const xercesc_3_1::InputSource &
src={...}) Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const xercesc_3_1::InputSource &
source={...}) Line 549
EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - // My application code
In the code, it is reaching at
else
{
emitError(XMLErrs::InvalidDocumentStructure);
...
}
The function at parse fail is as shown below:
void XMLScanner::scanProlog()
{
bool sawDocTypeDecl = false;
// Get a buffer for whitespace processing
XMLBufBid bbCData(&fBufMgr);
// Loop through the prolog. If there is no content, this could go all
// the way to the end of the file.
try
{
while (true)
{
const XMLCh nextCh = fReaderMgr.peekNextChar();
if (nextCh == chOpenAngle)
{
// Ok, it could be the xml decl, a comment, the doc type line,
// or the start of the root element.
if (checkXMLDecl(true))
{
// There shall be at lease --ONE-- space in between
// the tag '<?xml' and the VersionInfo.
//
// If we are not at line 1, col 6, then the decl was not
// the first text, so its invalid.
const XMLReader* curReader = fReaderMgr.getCurrentReader();
if ((curReader->getLineNumber() != 1)
|| (curReader->getColumnNumber() != 7))
{
emitError(XMLErrs::XMLDeclMustBeFirst);
}
scanXMLDecl(Decl_XML);
}
else if (fReaderMgr.skippedString(XMLUni::fgPIString))
{
scanPI();
}
else if (fReaderMgr.skippedString(XMLUni::fgCommentString))
{
scanComment();
}
else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
{
if (sawDocTypeDecl) {
emitError(XMLErrs::DuplicateDocTypeDecl);
}
scanDocTypeDecl();
sawDocTypeDecl = true;
// if reusing grammar, this has been validated already in first scan
// skip for performance
if (fValidate && fGrammar && !fGrammar->getValidated())
{
// validate the DTD scan so far
fValidator->preContentValidation(fUseCachedGrammar, true);
}
}
else
{
// Assume its the start of the root element
return;
}
}
else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
{
// If we have a document handler then gather up the
// whitespace and call back. Otherwise just skip over spaces.
if (fDocHandler)
{
fReaderMgr.getSpaces(bbCData.getBuffer());
fDocHandler->ignorableWhitespace
(
bbCData.getRawBuffer()
, bbCData.getLen()
, false
);
}
else
{
fReaderMgr.skipPastSpaces();
}
}
else
{
emitError(XMLErrs::InvalidDocumentStructure);
// Watch for end of file and break out
if (!nextCh)
break;
else
fReaderMgr.skipPastChar(chCloseAngle);
}
}
}
catch(const EndOfEntityException&)
{
// We should never get an end of entity here. They should only
// occur within the doc type scanning method, and not leak out to
// here.
emitError
(
XMLErrs::UnexpectedEOE
, "in prolog"
);
}
}
It is working fine when I move back to version 1.3, but due to various other requirements,
I have to use the new version 3.1 in my application.
Thanks in advance,
Jojo
was:
Hi All,
Please let me know, if anybody can provide some clue on this.
I have been using Xerces as XML parser in my C++ application and I have recently migrated
my Xerces version from 1.3 (very old) to 3.1.
After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource & source={...})
and passing a Unicode file as input, it pops up exception. However the same works ok for ANSI.
The call stack is as shown below.
xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const xercesc_3_1::InputSource &
src={...}) Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const xercesc_3_1::InputSource &
source={...}) Line 549
EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - <b>My application code</b>
It is working fine when I move back to version 1.3, but due to various other requirements,
I have to use the new version 3.1 in my application.
Thanks in advance,
Jojo
Added the exact code at which this fails.
> Xerces is poping up exception while parsing a Unicode file, but same is working fine
for an ANSI file
> -----------------------------------------------------------------------------------------------------
>
> Key: STDCXX-1053
> URL: https://issues.apache.org/jira/browse/STDCXX-1053
> Project: C++ Standard Library
> Issue Type: Bug
> Components: 20. General Utilities
> Environment: Windows XP
> Reporter: Jojo Jose
>
> Hi All,
> Please let me know, if anybody can provide some clue on this.
> I have been using Xerces as XML parser in my C++ application and I have recently migrated
my Xerces version from 1.3 (very old) to 3.1.
> After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource &
source={...}) and passing a Unicode file as input, it pops up exception. However the same
works ok for ANSI.
> The call stack is as shown below.
> xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes
> xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const xercesc_3_1::InputSource
& src={...}) Line 210
> xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const xercesc_3_1::InputSource
& source={...}) Line 549
> EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - // My application code
> In the code, it is reaching at
> else
> {
> emitError(XMLErrs::InvalidDocumentStructure);
> ...
> }
> The function at parse fail is as shown below:
> void XMLScanner::scanProlog()
> {
> bool sawDocTypeDecl = false;
> // Get a buffer for whitespace processing
> XMLBufBid bbCData(&fBufMgr);
> // Loop through the prolog. If there is no content, this could go all
> // the way to the end of the file.
> try
> {
> while (true)
> {
> const XMLCh nextCh = fReaderMgr.peekNextChar();
> if (nextCh == chOpenAngle)
> {
> // Ok, it could be the xml decl, a comment, the doc type line,
> // or the start of the root element.
> if (checkXMLDecl(true))
> {
> // There shall be at lease --ONE-- space in between
> // the tag '<?xml' and the VersionInfo.
> //
> // If we are not at line 1, col 6, then the decl was not
> // the first text, so its invalid.
> const XMLReader* curReader = fReaderMgr.getCurrentReader();
> if ((curReader->getLineNumber() != 1)
> || (curReader->getColumnNumber() != 7))
> {
> emitError(XMLErrs::XMLDeclMustBeFirst);
> }
> scanXMLDecl(Decl_XML);
> }
> else if (fReaderMgr.skippedString(XMLUni::fgPIString))
> {
> scanPI();
> }
> else if (fReaderMgr.skippedString(XMLUni::fgCommentString))
> {
> scanComment();
> }
> else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
> {
> if (sawDocTypeDecl) {
> emitError(XMLErrs::DuplicateDocTypeDecl);
> }
> scanDocTypeDecl();
> sawDocTypeDecl = true;
> // if reusing grammar, this has been validated already in first scan
> // skip for performance
> if (fValidate && fGrammar && !fGrammar->getValidated())
{
> // validate the DTD scan so far
> fValidator->preContentValidation(fUseCachedGrammar, true);
> }
> }
> else
> {
> // Assume its the start of the root element
> return;
> }
> }
> else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
> {
> // If we have a document handler then gather up the
> // whitespace and call back. Otherwise just skip over spaces.
> if (fDocHandler)
> {
> fReaderMgr.getSpaces(bbCData.getBuffer());
> fDocHandler->ignorableWhitespace
> (
> bbCData.getRawBuffer()
> , bbCData.getLen()
> , false
> );
> }
> else
> {
> fReaderMgr.skipPastSpaces();
> }
> }
> else
> {
> emitError(XMLErrs::InvalidDocumentStructure);
> // Watch for end of file and break out
> if (!nextCh)
> break;
> else
> fReaderMgr.skipPastChar(chCloseAngle);
> }
> }
> }
> catch(const EndOfEntityException&)
> {
> // We should never get an end of entity here. They should only
> // occur within the doc type scanning method, and not leak out to
> // here.
> emitError
> (
> XMLErrs::UnexpectedEOE
> , "in prolog"
> );
> }
> }
> It is working fine when I move back to version 1.3, but due to various other requirements,
I have to use the new version 3.1 in my application.
> Thanks in advance,
> Jojo
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|