Mailing-List: contact abdera-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: abdera-dev@incubator.apache.org
Received-SPF: pass (asf.osuosl.org: domain of jasnell@gmail.com designates
 66.249.82.200 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding;
        b=ZLhZIOcpxLvzWpsMtGNwaXWHXdztn76Fid2n7Y26nvZ09RheXXJ4bigf2dSN/dktVhEC/CtPgl2Iyi1Y/4ftTFQOp+CFUooYxR8pMXBRqCzBZI2g5DNNgJp+2nbVqxYfDZe9Or0Nx52tboabdpTIGhLY2TdgUM0bQcDWbHFvpR4=
Message-ID: <44B59A77.5020404@gmail.com>
Date: Wed, 12 Jul 2006 17:57:27 -0700
From: James M Snell <jasnell@gmail.com>
User-Agent: Thunderbird 1.5.0.4 (X11/20060516)
MIME-Version: 1.0
To: abdera-dev@incubator.apache.org
Subject: Re: Async Parsing?
References: <7edfeeef0607121523jef78d8ag2251de20256bea54@mail.gmail.com>
	 <44B57958.1090608@gmail.com>
 <7edfeeef0607121539n5a1fea8cg105ece2b275c52ec@mail.gmail.com>
In-Reply-To: <7edfeeef0607121539n5a1fea8cg105ece2b275c52ec@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

I'm not sure I could reasonably envision any use of nonblocking i/o
operations in an xml parser.  I'm not sure if I've ever seen anyone do
it before.

In any case, I figured you might find this entertaining:

http://www.snellspace.com/wp/?p=381

http://danga.com:8081/atom-stream.xml is a never-ending xml stream.

    URL url = new URL("http://danga.com:8081/atom-stream.xml");
    // we only care about the feed title and alternate link,
    // we'll ignore everything else
    ParseFilter filter = new WhiteListParseFilter();
    filter.add(new QName("atomStream"));
    filter.add(Constants.FEED);
    filter.add(Constants.TITLE);
    filter.add(Constants.LINK);
    ParserOptions options = Parser.INSTANCE.getDefaultParserOptions();
    options.setParseFilter(filter);
    Document doc = Parser.INSTANCE.parse(
      url.openStream(),(URI)null,options);
    Element el = doc.getRoot();
    // get the first feed in the stream, then continue to iterate
    // from there, printing the title and alt link to the console
    Feed feed = el.getFirstChild(Constants.FEED);
    while (feed != null) {
      System.out.println(
        feed.getTitle() + "t" + feed.getAlternateLink().getHref());
      Feed next = feed.getNextSibling(Constants.FEED);
      feed.discard();
      feed = next;
    }

There are some memory-creep issues so I wouldn't recommend keeping this
running forever :-)

- James

Garrett Rooney wrote:
> On 7/12/06, James M Snell <jasnell@gmail.com> wrote:
>> Right now documents are parsed incrementally.  When I
>> Parser.INSTANCE.parse(...) what I'm handed back is not yet a fully
>> parsed document.  As I go through the various getters, the document is
>> parsed up to whatever point it needs to answer that method call.  It
>> could definitely be better, however.
> 
> Sure, but if you call getRoot() it's still going to block until it
> reads enough off of the InputStream to parse the root element, right?
> There's no nonblocking interface.
> 
> -garrett
>