lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: SimplePostTool error (UNCLASSIFIED)
Date Fri, 15 Jul 2016 16:46:04 GMT
On Fri, Jul 15, 2016 at 12:29 PM, Erick Erickson
<erickerickson@gmail.com> wrote:
> simplePostTool is just that, simple. It's intended to get you started.
> It is not a full-featured web crawler. As such, if you're encountering
> wonky web pages that are not well formed HTML there's no guarantee
> that it'll handle them gracefully.

HTML is not well formed XML though.  Hopefully we're not using an XML
parser to try and parse HTML?
The error message "XML document structures must start and end within
the same entity." is true for XML, but not for HTML.

-Yonik

Mime
View raw message