abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Graceful handling of non-atom 1.0 feeds
Date Sun, 11 Jun 2006 02:41:23 GMT
Abdera will successfully parse any well-formed XML.  The trick is not to
use generics when parsing.

Document doc = Parser.INSTANCE.parse(someInputStream);

The parser will automatically detect whether the XML stream is an Atom
document (Feed, Entry or Atom Publishing Protocol Introspection doc) or
whether it is some other XML.

Element element = doc.getRoot();

if (element instanceof Feed) {
  // it was an Atom Feed document }
if (element instanceof Entry) {
  // it was an Atom Entry document }
if (element instanceof Service) {
  // it was an APP Introspection document }
if (element instanceof ExtensionElement) {
  // it was arbitrary XML }

More below.

Paul Querna wrote:
> Garrett Rooney wrote:
>> In my experiements with pulling titles out of atom feeds last night, I
>> inadvertently pointed my PrintTitles program at some atom 0.3 feeds.
>> The results were, well, explosive.
>> Now I'm not saying we should parse those feeds, we should really
>> restrict oursives to atom 1.0, but it might be nice if we at least
>> recognize them when we encounter them, so we can throw something more
>> informative than a ClassCastException (the usual result) or
>> NullPointerException (if you've got a ParseFilter set up).

The NPE is likely a bug. That shouldn't happen.  The ClassCastException
is likely caused by the use of generics.  Atom 0.3 and RSS 1.x/2.x will
be parsed as Document<ExtensionElement> (e.g. doc.getRoot() should
return an instance of FOMExtensionElement)

> +1.
> As more a policy issue, do people think Abdera should attempt to
> successfully parse content, even if they contain errors/violations of
> the spec?

The parser is currently very liberal.  It will make sure that Atom Date
Constructs are at least in iso8601 format and will validate URI's, but
everything else is left wide open.  The absolute minimum it requires is
well-formed XML.  A broad spectrum of Atom spec violations are allowed.

We don't attempt to correct any of those errors, however.  For example,
if someone puts escaped HTML markup in a text construct that is marked
as text, Abdera will represent that data as plain text.

> Someone somewhere out on the Internet will break the spec, produce
> invalid XML, put invalid encodings in there, miss required fields, put
> invalid data in those fields.. etc.  While some of these problems will
> require support from lower level components(Axiom), much of the handling
> stiil is up to Abdera.
> -Paul

- James

View raw message