lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabio Confalonieri <>
Subject International Charsets in embedded XML
Date Tue, 13 Jun 2006 13:06:48 GMT

(sorry the last one got wrongly posted)

Here I am again with charset encoding problems:

I need to store XML in a document field. I declare it as string and surround
it in CData when I post the add xml.
Now the problem is I have some Iternational char in the XML: say  ì or à and
also € (i don't know if You can read these).

When i get back from Solr the XML field strange things happens:

- first one: € get converted to ? (I see it in the index looking with luke)

- if there is an ì (accented ì) I get malformed XML back using with firefox
and IE:

<?xml version="1.0" encoding="UTF-8"?>
  <result numFound="1" start="0">
      <str name="categoryid">/relazioni/</str>
      <str name="facetXML">&lt;?xml version="1.0" encoding="UTF-8"?>&lt;xml>
	&lt;filter field="typecamper_s">
	&lt;item value="autocaravanmansardato">Autocaravan ìMansardato</item>
							                           ^ HERE begins the problem: from now on no
more shielding of "<"

	<item value="semintegrale">Semintegrale</item>
	HERE continues the output, as it should have been shielded after the
problem above:
	&lt;/item>&lt;item value="semintegrale">Semintegrale&lt;/item>&lt;/filter>

But if i get the same document in my request handler (as a Document
structure) I don't have any problem parsing the XML and get the correct
I have traced the XML.escape and the problem is not there so it's somewere
between XMLWriter and Jetty (I've tried the last one 5.1.11).

- if i put some international char in a normal string field I see Solr
stores the UTF-8 (i Think) encoded char in a string as in a text field type.

The question is: apart from the malformed XML issue, what is the better way
to deal with internationa charsets ?

Thank You

View this message in context:
Sent from the Solr - User forum at

View raw message