abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Async Parsing?
Date Thu, 13 Jul 2006 00:57:27 GMT
I'm not sure I could reasonably envision any use of nonblocking i/o
operations in an xml parser.  I'm not sure if I've ever seen anyone do
it before.

In any case, I figured you might find this entertaining:

http://www.snellspace.com/wp/?p=381

http://danga.com:8081/atom-stream.xml is a never-ending xml stream.

    URL url = new URL("http://danga.com:8081/atom-stream.xml");
    // we only care about the feed title and alternate link,
    // we'll ignore everything else
    ParseFilter filter = new WhiteListParseFilter();
    filter.add(new QName("atomStream"));
    filter.add(Constants.FEED);
    filter.add(Constants.TITLE);
    filter.add(Constants.LINK);
    ParserOptions options = Parser.INSTANCE.getDefaultParserOptions();
    options.setParseFilter(filter);
    Document doc = Parser.INSTANCE.parse(
      url.openStream(),(URI)null,options);
    Element el = doc.getRoot();
    // get the first feed in the stream, then continue to iterate
    // from there, printing the title and alt link to the console
    Feed feed = el.getFirstChild(Constants.FEED);
    while (feed != null) {
      System.out.println(
        feed.getTitle() + "t" + feed.getAlternateLink().getHref());
      Feed next = feed.getNextSibling(Constants.FEED);
      feed.discard();
      feed = next;
    }

There are some memory-creep issues so I wouldn't recommend keeping this
running forever :-)

- James

Garrett Rooney wrote:
> On 7/12/06, James M Snell <jasnell@gmail.com> wrote:
>> Right now documents are parsed incrementally.  When I
>> Parser.INSTANCE.parse(...) what I'm handed back is not yet a fully
>> parsed document.  As I go through the various getters, the document is
>> parsed up to whatever point it needs to answer that method call.  It
>> could definitely be better, however.
> 
> Sure, but if you call getRoot() it's still going to block until it
> reads enough off of the InputStream to parse the root element, right?
> There's no nonblocking interface.
> 
> -garrett
> 

Mime
View raw message