lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: CDATA response is coming with "&lt:" instead of "<"
Date Tue, 21 Apr 2015 15:38:07 GMT
On Tue, Apr 21, 2015 at 9:46 AM, mesenthil1
<senthilkumar.arumugam@viacomcontractor.com> wrote:
> We are using DIH for indexing XML files. As part of the xml we have xml
> enclosed with CDATA. It is getting indexed but in response the CDATA content
> is coming as decoded terms instead of symbols.

Your problem is ambiguous since we can't tell what is data, and what
is markup (transfer syntax).

If you were to index this same data using JSON, what would you pass?
Is it this:
"<Images><image><uri>..."
Or is it this?
"<![CDATA[<Images><image><uri>..."

If it's the former, you're already set - it's working that way now.
If it's the latter, then if you index that in XML you will need to
escape it like any other XML value.  Otherwise the XML parser will
remove the CDATA stuff before it gets to the indexing part of Solr.

-Yonik

Mime
View raw message