lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <thors...@apache.org>
Subject Re: How to tell the highlighter not to escape?
Date Wed, 03 Jan 2007 10:11:17 GMT
On Wed, 2007-01-03 at 02:16 +0000, Edward Garrett wrote:
> thorsten,
> 
> see the following for discussion. your case is indeed an annoyance--the
> thread below discusses motivations for it and ways of working around it. (i
> too confess that i wish it were not so.)
> 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg01483.html

Thanks Edward, the problem is with the suggestion in the above thread is
that:
"just create an XSL that
generates XML and unescapes the fields you know will contain wellformed
XML data -- then apply your second transform client side"

Is not possible with xsl. See e.g. http://www.biglist.com/lists/xsl-list/archives/200109/msg00318.html
"> How can I match the Cdata Section?!?
>
You can't, the XPath data model regards CDATA as merely an input shortcut,
not as an information-bearing part of the XML content. In other words,
"<![CDATA[x]]>" and "x" look exactly the same to the XSLT processor.

Mike Kay"

Michael Kay is the xsl guru and I can say as well from my own experience
one would need to write a custom parser since <![CDATA[<em>TERM</em>]]>
is equal to &lt;em&gt;TERM&lt;/em&gt; and this in xsl is a string (XPath
would match text()). 

IMO the highlighter should really return pure xml and not escape it. 
I will have a look in the XmlResponseWriter maybe I find a way to change this.

salu2


> 
> -edward
> 
> On 1/2/07, Mike Klaas <mike.klaas@gmail.com> wrote:
> >
> > Hi Thorsten,
> >
> > The highlighter does not escape anything itself: you are seeing the
> > results of solr's automatic escaping of xml data within its xml
> > response.  This should be transparent (your xml decoder should
> > un-escape the values on the way out).  I'm not really familiar with
> > xslt so I'm unsure why that isn't so (perhaps it is automatically
> > html-escaping the values after un-xml-escaping them?)
> >
> > Be careful of documents containing html fragments natively.
> >
> > cheers,
> > -MIke
> >
> > On 1/2/07, Thorsten Scherler <thorsten.scherler.ext@juntadeandalucia.es>
> > wrote:
> > > Hi all,
> > >
> > > I am playing around with the highlighter and found that all highlight
> > > terms get escaped.
> > >
> > > I mean solr will return
> > >  &lt;em&gt;TERM&lt;/em&gt; and not
> > > <em> TERM </em>
> > >
> > > I am not sure where this escaping is happening but I would need the
> > > highlighting to NOT escape the hl.simple.pre and hl.simple.post tag
> > > since it is horror to work with cdata sections in xsl.
> > >
> > > I had a look in the lucene highlighter and it seem that it does not
> > > escape the tags.
> > >
> > > Can somebody point me to code which is responsible for escaping and
> > > maybe give me a tip how I can patch to make it configurable.
> > >
> > > TIA
> > >
> > > salu2
> > >
> > >
> >
> 
> 
> 
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)



Mime
View raw message