lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eShard <>
Subject detailed Error reporting in Solr
Date Thu, 04 Apr 2013 14:23:35 GMT
Good morning,
I'm currently running Solr 4.0 final with tika v1.2 and Manifoldcf v1.2 dev. 
And I'm battling Tika XML parse errors again. 
Solr reports this error:  	org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error which is too vague.
I had to manually run the link against the tika app and I got a much more
detailed error.
Caused by: org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 105;
The entity "nbsp" was referenced, but not declared.
so there are old school non break space in the html that tika can't handle.

for example: <li> Cyber Systems and Technology&nbsp;&rsaquo;
</mission/CST/CST.html>   </li>

My question is two fold:
1) how do I get solr to report more detailed errors and
2) how do I get tika to accept (or ignore) nbsp?


View this message in context:
Sent from the Solr - User mailing list archive at

View raw message