xerces-j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Gao" <sandy...@ca.ibm.com>
Subject Re: Error recovery while parsing XMLs ..
Date Wed, 16 Apr 2003 13:52:33 GMT
Hi Rajiv,

> Do you think it is possible to improve the recovery by
> making the parser consider the content model during
> the error recovery phase?

For the input:

<icon >
    <small-icon>stillTypingMygif
    <large-icon>/tmp/large.gif</large-icon>
</icon>

By the time the parser realizes that there is a fatal error, it should've
already reached </icon>. It can't go back and add </small-icon> before
<large-icon>. (Because, for SAX, the events for <large-icon> have already
been sent to the application.)

Thanks,
Sandy Gao
Software Developer, IBM Canada
(1-905) 413-3255
sandygao@ca.ibm.com



                                                                                         
                                      
                      Rajiv Shivane                                                      
                                      
                      <murgus@yahoo.com        To:       xerces-j-user@xml.apache.org,
xerces-j-dev@xml.apache.org              
                      >                        cc:                                    
                                         
                                               Subject:  Re: Error recovery while parsing
XMLs ..                               
                      04/08/2003 07:30                                                   
                                      
                      AM                                                                 
                                      
                      Please respond to                                                  
                                      
                      xerces-j-user                                                      
                                      
                                                                                         
                                      
                                                                                         
                                      



Hi Sandy,

I completely agree that typically during error
recovery there is more than one ``Correction'' that
can be applied to the stream so that the stream is
parseable. Any parser can atmost take a best guess at
which correction is to be applied to recover from the
error.

To make the best guess is it possible to make the
parser look at the DTD and eliminate some of the
corrections? In the example I gave :

<icon >
    <small-icon>stillTypingMygif
    <large-icon>/tmp/large.gif</large-icon>
</icon>

There are more than one Corrections with which the
parser can recover. But the element declarations are:

<!ELEMENT icon (small-icon?, large-icon?)>
<!ELEMENT small-icon (#PCDATA)>
<!ELEMENT large-icon (#PCDATA)>

So the best guess in this case should have been to add
</small-icon> before <large-icon>

Do you think it is possible to improve the recovery by
making the parser consider the content model during
the error recovery phase? Could you give me some hints
as to how I can go about doing this?

Thanks!
Rajiv

--- Sandy Gao <sandygao@ca.ibm.com> wrote:
> But how would the parser know the input is not
>
>   <icon >
>    <small-icon>stillTypingMygif
>    <large-icon>/tmp/large.gif</large-icon>
>    </small-icon>
>   </icon>
>
> in which case adding </small-icon> before
> <large-icon> is worse.
>
> In dealing with errors (especially well-formedness
> ones), the best the
> parser can do is to make a guess, which can't be
> guaranteed to be the best
> in all cases.
>
> Thanks,
> Sandy Gao
> Software Developer, IBM Canada
> (1-905) 413-3255
> sandygao@ca.ibm.com


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Mime
View raw message