lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: How to index text field with html entities ?
Date Fri, 29 Jul 2016 22:43:22 GMT
On 7/29/2016 4:05 PM, Bruno Mannina wrote:
> after checking my log it seems that it concerns only some html entities.
> No problem with &amp; but I have problem with:
> &uuml;
> &ldquo;
> etc...

Those are valid *HTML* entities, but they are not valid *XML* entities. 
The list of entities that are valid in XML is quite short -- there are
only five of them.

When Solr processes XML, it is only going to convert entities that are
valid for XML -- the five already mentioned.  It will fail on the other
247 entities that are only valid for HTML.

If you are seeing the problem with &amp; (which is one of the five valid
XML entities) then we'll need the Solr version and the full error
message/stacktrace from the solr logfile.


View raw message