Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@www.apache.org Received: (qmail 15775 invoked from network); 23 Sep 2004 00:35:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 23 Sep 2004 00:35:07 -0000 Received: (qmail 38714 invoked by uid 500); 23 Sep 2004 00:35:02 -0000 Delivered-To: apmail-jakarta-commons-dev-archive@jakarta.apache.org Received: (qmail 38642 invoked by uid 500); 23 Sep 2004 00:35:01 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 38620 invoked by uid 99); 23 Sep 2004 00:35:01 -0000 X-ASF-Spam-Status: No, hits=0.8 required=10.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (hermes.apache.org: transitioning domain of bkn3@columbia.edu does not designate 69.20.6.88 as permitted sender) Received: from [69.20.6.88] (HELO smtp.com) (69.20.6.88) by apache.org (qpsmtpd/0.28) with SMTP; Wed, 22 Sep 2004 17:34:59 -0700 Received: (qmail 29309 invoked from network); 23 Sep 2004 00:34:57 -0000 Received: from unknown (HELO ROGUE?DESKTOP.columbia.edu) (12.43.53.196) by smtp.com with SMTP; 23 Sep 2004 00:34:57 -0000 Message-Id: <6.1.0.6.2.20040922173418.0220bac0@pop.mail.yahoo.com> X-Sender: bradneuberg@pop.mail.yahoo.com X-Mailer: QUALCOMM Windows Eudora Version 6.1.0.6 Date: Wed, 22 Sep 2004 17:34:56 -0700 To: commons-dev@jakarta.apache.org From: Brad Neuberg Subject: [feedparser] Patch for Text America support, AOL Journal Atom support, and a refactored TestProbeLocator Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Tracking-Number: 200409221196566 X-Spam-Policy: SMTP.com is a paid relay service. We don't tolerate UCE of any kind. Please report it ASAP to abuse@smtp.com (sender ID is 303580) X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Apologies if you get this twice; it was rejected the first time because I attached the patch to this email and it was too large. This email describes several enhancements to the Jakarta Feed Parser. The patch file in unified diff format is located at http://codinginparadise.org/feedparser/textamerica.patch. Here are the enhancements: 1) TestProbeLocator.java - I've completely refactored this testing class to make it much more maintainable. It used to be huge, since it included a large number of tests. I've refactored all of the testing code into one generic method, testSite, which is called by each blog service that we are testing, such as testXanga(), testLiveJournal(), etc. The code is much more readable and understandable now. I've also added testing for AOL Journal's new Atom support (see point 2 below) and for our new Text America support (see point 3 below). 2) AOL Journal - AOL Journal recently turned on Atom support (in addition to their existing RSS support). This is exposed in autodiscovery. I've updated the probe locator system to be able to aggresively find it if autodiscovery fails (but only if aggresive discovery is turned on, and it is off by default). It turns out that AOL's support for Atom is still a bit sketchy; you can do an HTTP GET on the Atom file, but HTTP HEAD requests fail with an HTTP 500 internal server error; this does not happen for HEAD requests on the RSS file. This means that aggresive autodiscovery will not work for Atom, because it issues a HEAD request instead of a GET for performance reasons; I've updated the code to do the probing, but it will fail for now. This doesn't affect anything since it then tries to retrieve the RSS file and succeeds. If AOL fixes this bug then the code will automatically find and prefer the Atom feed, which is good. 3) Text America Support - Text America is a mobile blogging service (http://www.textamerica.com). They have an evil bug, though. They have autodiscovery on their feeds, but it points to the wrong location! To handle this, I added BlogService.TEXTAMERICA, changed BlogService to have a property named hasValidAutodiscovery which indicates whether we can trust the values given by autodiscovery for a particular blog service, and added a hasValidAutodiscovery() method to grab this value. I then modified FeedLocator to always call ProbeLocator, and changed ProbeLocator to "fail fast" if we already have some results (found through autodiscovery, for example) AND to call blogService.hasValidAutodiscovery(). If this is false then we go ahead and do aggresive link probing and clear out the list of existing, false autodiscovery links. I've run TestProbeLocator and everything passes. Brad Neuberg, bkn3@columbia.edu Senior Software Engineer, Rojo Networks ===================================================================== Check out Rojo, an RSS and Atom news aggregator that I work on. Visit http://rojo.com for more info. Feel free to ask me for an invite! Rojo is Hiring! If you're interested in RSS, Weblogs, Social Networking, Java, Open Source, etc... then come work with us at Rojo. If you recommend someone and we hire them you'll get a free iPod! See http://www.rojonetworks.com/JobsAtRojo.html. --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org