Return-Path: Delivered-To: apmail-incubator-abdera-dev-archive@locus.apache.org Received: (qmail 72208 invoked from network); 13 Jul 2006 00:57:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Jul 2006 00:57:52 -0000 Received: (qmail 72373 invoked by uid 500); 13 Jul 2006 00:57:51 -0000 Delivered-To: apmail-incubator-abdera-dev-archive@incubator.apache.org Received: (qmail 72356 invoked by uid 500); 13 Jul 2006 00:57:51 -0000 Mailing-List: contact abdera-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: abdera-dev@incubator.apache.org Delivered-To: mailing list abdera-dev@incubator.apache.org Received: (qmail 72347 invoked by uid 99); 13 Jul 2006 00:57:51 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2006 17:57:51 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of jasnell@gmail.com designates 66.249.82.200 as permitted sender) Received: from [66.249.82.200] (HELO wx-out-0102.google.com) (66.249.82.200) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2006 17:57:50 -0700 Received: by wx-out-0102.google.com with SMTP id s6so19105wxc for ; Wed, 12 Jul 2006 17:57:30 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=ZLhZIOcpxLvzWpsMtGNwaXWHXdztn76Fid2n7Y26nvZ09RheXXJ4bigf2dSN/dktVhEC/CtPgl2Iyi1Y/4ftTFQOp+CFUooYxR8pMXBRqCzBZI2g5DNNgJp+2nbVqxYfDZe9Or0Nx52tboabdpTIGhLY2TdgUM0bQcDWbHFvpR4= Received: by 10.70.24.16 with SMTP id 16mr74847wxx; Wed, 12 Jul 2006 17:57:30 -0700 (PDT) Received: from ?192.168.1.104? ( [67.181.218.96]) by mx.gmail.com with ESMTP id h9sm1281368wxd.2006.07.12.17.57.28; Wed, 12 Jul 2006 17:57:29 -0700 (PDT) Message-ID: <44B59A77.5020404@gmail.com> Date: Wed, 12 Jul 2006 17:57:27 -0700 From: James M Snell User-Agent: Thunderbird 1.5.0.4 (X11/20060516) MIME-Version: 1.0 To: abdera-dev@incubator.apache.org Subject: Re: Async Parsing? References: <7edfeeef0607121523jef78d8ag2251de20256bea54@mail.gmail.com> <44B57958.1090608@gmail.com> <7edfeeef0607121539n5a1fea8cg105ece2b275c52ec@mail.gmail.com> In-Reply-To: <7edfeeef0607121539n5a1fea8cg105ece2b275c52ec@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I'm not sure I could reasonably envision any use of nonblocking i/o operations in an xml parser. I'm not sure if I've ever seen anyone do it before. In any case, I figured you might find this entertaining: http://www.snellspace.com/wp/?p=381 http://danga.com:8081/atom-stream.xml is a never-ending xml stream. URL url = new URL("http://danga.com:8081/atom-stream.xml"); // we only care about the feed title and alternate link, // we'll ignore everything else ParseFilter filter = new WhiteListParseFilter(); filter.add(new QName("atomStream")); filter.add(Constants.FEED); filter.add(Constants.TITLE); filter.add(Constants.LINK); ParserOptions options = Parser.INSTANCE.getDefaultParserOptions(); options.setParseFilter(filter); Document doc = Parser.INSTANCE.parse( url.openStream(),(URI)null,options); Element el = doc.getRoot(); // get the first feed in the stream, then continue to iterate // from there, printing the title and alt link to the console Feed feed = el.getFirstChild(Constants.FEED); while (feed != null) { System.out.println( feed.getTitle() + "t" + feed.getAlternateLink().getHref()); Feed next = feed.getNextSibling(Constants.FEED); feed.discard(); feed = next; } There are some memory-creep issues so I wouldn't recommend keeping this running forever :-) - James Garrett Rooney wrote: > On 7/12/06, James M Snell wrote: >> Right now documents are parsed incrementally. When I >> Parser.INSTANCE.parse(...) what I'm handed back is not yet a fully >> parsed document. As I go through the various getters, the document is >> parsed up to whatever point it needs to answer that method call. It >> could definitely be better, however. > > Sure, but if you call getRoot() it's still going to block until it > reads enough off of the InputStream to parse the root element, right? > There's no nonblocking interface. > > -garrett >