lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: detailed Error reporting in Solr
Date Fri, 05 Apr 2013 17:27:20 GMT
It is not a bug. XML parsers are required to reject documents with undefined character entities.

Try parsing it as HTML or XHTML.

wunder

On Apr 4, 2013, at 11:14 AM, eShard wrote:

> Yes, that's it exactly.
> I crawled a link with these (&nbsp;&rsaquo;) in each list item and solr
> couldn't handle it threw the xml parse error and the crawler terminated the
> job.
> 
> Is this fixable? Or do I have to submit a bug to the tika folks?
> 
> Thanks,
> 





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message