lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: How to index text field with html entities ?
Date Fri, 29 Jul 2016 22:43:22 GMT
On 7/29/2016 4:05 PM, Bruno Mannina wrote:
> after checking my log it seems that it concerns only some html entities.
> No problem with &amp; but I have problem with:
>
> &uuml;
> &ldquo;
> etc...

Those are valid *HTML* entities, but they are not valid *XML* entities. 
The list of entities that are valid in XML is quite short -- there are
only five of them.

https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

When Solr processes XML, it is only going to convert entities that are
valid for XML -- the five already mentioned.  It will fail on the other
247 entities that are only valid for HTML.

If you are seeing the problem with &amp; (which is one of the five valid
XML entities) then we'll need the Solr version and the full error
message/stacktrace from the solr logfile.

Thanks,
Shawn


Mime
View raw message