commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: [Digester] How can I 'shortcut' a parse?
Date Sun, 01 May 2005 21:14:54 GMT
On Sun, 2005-05-01 at 12:27 -0500, Worley Kevin wrote:
> Howdy,
> I would like to know if anyone can tell me how to 'shortcut' a parse.
> The XML file that I need to parse does not use Namespaces, but the rules
> I need to use vary based on information in the "header" section.
> Basically, the file is something like:
> <bulkdata>
>   <header>
>     <PayloadID>X956487</PayloadID>
>     <GenDate>1967-08-13</GenDate>
>     <Mode>ABC</Mode>
>     <ContactEmail></ContactEmail>
>     <SupplierID>A123456789B</SupplierID>
>   </header>
>   <body>
>     <LineItem LineNumber="0">
>       <ID>
>         <Supplier>XYX Corp.</Supplier>
>         <SupplierGroup>Group a</SupplierGroup>
>         <ReferenceID>6565656</ReferenceID>
>         <EmployeeName>John Doe</EmployeeName>
>         <EmployeeNumber>123</EmployeeNumber>
>         <BLNumber>AW54645664Z</BLNumber>
>         <AWB>456789456</AWB>
>         <HAWB>456789789</HAWB>
>       </ID>
> 	...
>     </LineItem>
>   </body>
> </bulkdata>
> The 'body' can consist of thousands of 'LineItem' elements which are
> each much more extensive than shown here.  Currently, I use Digester
> with rules to parse the header.  When complete, I can look at what was
> returned and set the rules to correctly parse the 'body' of the file.
> This works, but requires the digester to completely parse the file
> twice.  I'd really like to avoid doing it this way.
> Does anyone know of a way I can have the parser simply end after the
> 'header' section is parsed?  

I think you could write your own Rule class that throws an exception:

class CancelParseException extends SAXException {}

class CancelParseRule extends Rule {
    public void begin(...) {
      throw new CancelParseException();

digester.addRule("bulkdata/body", new CancelParseRule());
try {
} catch(CancelParseException ex) {
  // ok

The above is only pseudocode; not tested!

Alternatively, run the xml input through an org.xml.sax.XMLFilter before
passing it to Digester to discard the unwanted xml on the first pass.
The xml will still be parsed, but at least Digester won't have to
process it. Or use XSLT to do the same job of "filtering" the input

Note that Digester is a ContentHandler, so can be passed to
parser.setContentHandler rather than calling digester.parse in order to
incorporate it in a "pipeline" of SAX events. This can be useful when
playing tricks with xml such as filtering/transforming before passing
data to digester.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message