lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Sutton <>
Subject Re: unable to figure out nutch type highlighting in solr....
Date Sat, 06 Oct 2007 00:09:30 GMT
> Named entity references are valid in XML.  They just need to be  
> declared
> before they are used[1], unless they are one of the builtin named
> entities &lt; &gt; &apos; &quot; or &amp; -- these are always valid
> when
> parsing with an XML parser.

Correct, it was an offhand comment and I skipped over all the  
details. In general named entities other than the built-ins aren't  
declared at the top of the file and many parsers don't bother to read  
in external DTDs so any entities declared there aren't read and are  
therefore considered invalid.

> XHTML is XML, so if parsed by an XML parser, XML's builtin named
> entities are available, and if the parser doesn't ignore external
> entities, then the same set of (roughly) 250 named entities defined in
> HTML are available as well[2].

Except that no browser that I know of actually reads in the XHTML DTD  
when in standards compliant mode, so none of those entities are  
actually viable to be used unless you include the declarations for  
them at the top of every XHTML document (which is ludicrous).

The bottom line is that it's far, far better to use numeric entities  
in XML and simply ignore all but the built-in named entities if you  
want to have any confidence that the document will be parsed  
correctly - hence my offhand comment.


Adrian Sutton

View raw message