lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jérôme Etévé" <jer...@eteve.net>
Subject Re: Problem with html code inside xml
Date Tue, 25 Sep 2007 11:06:03 GMT
If I understand, you want to keep the raw html code in solr like that
(in your posting xml file):

<field name="storyFullText">
  <html></html>
</field>

I think you should encode your content to protect these xml entities:
<  ->  &lt;
> -> &gt;
" -> &quot;
& -> &amp;

If you use perl, have a look at HTML::Entities.


On 9/25/07, steve.christin@gmail.com <steve.christin@gmail.com> wrote:
> Hello,
>
> I've got some problem with html code who is embedded in xml file:
>
> Sample source .
>
> <content>
>         <stories>
>                 <div class="storyTitle">
>                          Les débats
>                 </div>
>                 <div class="storyIntroductionText">
>                         Le premier tour des élections fédérales se déroulera le 21
> octobre prochain. D'ici là, La 1ère vous propose plusieurs rendez-
> vous, dont plusieurs grands débats à l'enseigne de Forums.
>                 </div>
>                 <div class="paragraph">
>                         <div class="paragraphTitle"/>
>                         <div class="paragraphText">
>                                 my para textehere
>                                 <br/>
>                                 <br/>
>                                 Vous trouverez sur cette page toutes les dates et les
heures de
> ces différents rendez-vous ainsi que le nom et les partis des
> débatteurs. De plus, vous pourrez également écouter ou réécouter
> l'ensemble de ces émissions.
>                         </div>
>                 </div>
> ....
> ---------
> When a make a query on solr I've got something like that in the
> source code of the xml result:
>
> <td xmlns="http://www.w3.org/1999/xhtml">
> <span class="markup">&lt;</span>
> <span class="start-tag">div</span>
> <span class="attribute-name">class</span>
> <span class="markup">=</span>
> <span class="attribute-value">"paragraph"</span>
> <span class="markup">&gt;</span><div class="expander-content">
> <div class="indent"><span class="markup">&lt;</span>
> <span class="start-tag">div</span>
> <span class="attribute-name">class</span>
> <span class="markup">=</span>
> <span class="attribute-value">"paragraphTitle"</span>
> <span class="markup">/&gt;</span></div><table><tr>
> <td class="expander">−<div class="spacer"/>
> </td><td><span class="markup">&lt;</span>
> ...
>
> It is not exactly what I want. I want to keep the html tags, that all
> without formatting.
>
> So the br tags and a tags are well formed in xml and json result, but
> the div tags are not kept.
> ---------
> In the schema.xml I've got this for the html content
>
> <fieldType name="html" class="solr.TextField" />
>
>   <field name="storyFullText" type="html" indexed="true"
> stored="true" multiValued="true"/>
>
> ---------
>
> Any help would be appreciate.
>
> Thanks in advance.
>
> S. Christin
>
>
>
>
>
>


-- 
Jerome Eteve.
jerome@eteve.net
http://jerome.eteve.free.fr/
Mime
View raw message