ws-commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Veithen (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (WSCOMMONS-394) StAXUtils: Add Network Detached XMLStreamReader capability
Date Mon, 15 Dec 2008 16:51:44 GMT

     [ https://issues.apache.org/jira/browse/WSCOMMONS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Veithen reopened WSCOMMONS-394:
---------------------------------------

      Assignee: Andreas Veithen  (was: Rich Scheuerle)

While I agree with the analysis, I think the proposed solution is suboptimal:

1) The "network detached" XMLStreamReader will still try to connect to the network to retrieve
the external DTD subset. This doesn't solve the performance problem. It might even make it
worse if the network error is only triggered after a timeout.

2) In order to provide predictable results, Axiom should either attempt to load the DTD and
report an error if it fails, or not attempt to load the DTD at all. The current solution might
lead to subtle bugs when a machine that normally is connected to the network is suddenly disconnected.

3) The current solution simply ignores the error and continues to pull events from the parser.
However there is a risk that after the error, the parser remains in an inconsistent state.
WSCOMMONS-372 shows a case where after throwing an exception from XMLStreamReader#getText()
caused by an unexpected end of stream, Woodstox happily continues to return events. This might
also happen with the current solution.

One of the problems is that even if IS_SUPPORTING_EXTERNAL_ENTITIES is set to false, Woodstox
still tries to load the external DTD subset. This can be avoided by registering a custom XMLResolver
that simply returns an empty document when asked to load the DTD. I tested this solution and
it gives the expected result. In particular, the parser no longer throws an exception, so
that we can get rid of the workaround implemented in StAXOMBuilder#getDTDText().

If there are no objections, I will clean up my solution and than commit it.

I also noticed that the test case in OMDTDTest is not entirely correct. In fact it tries to
simulate a network error using a malformed URL. However even if the parser didn't try to load
the DTD, it would still be allowed to complain about the invalid URL. The test case should
use a well formed URL but make sure that there is no document at that URL.

> StAXUtils: Add Network Detached XMLStreamReader capability
> ----------------------------------------------------------
>
>                 Key: WSCOMMONS-394
>                 URL: https://issues.apache.org/jira/browse/WSCOMMONS-394
>             Project: WS-Commons
>          Issue Type: Improvement
>          Components: AXIOM
>            Reporter: Rich Scheuerle
>            Assignee: Andreas Veithen
>             Fix For: Axiom 1.2.8
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Background:
> The JSR 173 (StAX) Specification did not do an adequate job defining the semantics for
processing DTD DOCTYPE constructs.
> The reference implementation's getValue() returns the entire subset of the DOCTYPE instead
of returning the instance (docinfo) information.
> This is a known issue and has been discussed on the forum.
> http://markmail.org/message/im6f2yu2y544k3he
> The problem is worse if the DOCTYPE references as external location.  To get the subset,
the parser implementation must do a network call.
> This is (a) ill-performant and (b) requires the application to be attached to a network.
> In addition, the various parser implementations have different mechanisms for getting
the DOCTYPE subset.  Some implementations apparently defer
> the processing until the getText() call...while other implementations load the subset
when the tag is processed.
> Problem Scenario:
> Configuration and deployment files (i.e. web.xml) often contain DOCTYPE constructs. 
 In many situations, the deployer may not be connected to the 
> network when processing the file.   In such a scenario, the deployer needs a mechanism
to process the file without being hindered by the DOCTYPE
> processing.
> Solution:
> The proposed solution is to add new methods to StAXUtils:
>    XMLStreamReader getNetworkDetachedXMLStreamReader(...)
> A caller (i.e. a deployer application) can use the new methods to safely obtain an XMLStreamReader
that is configured for a network detached environment.
> As StAX changes, we can update the implementation of the methods.
> Next Action:
> I am working on the proposed solution and tests.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message