lucene-ruby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiago Jackiw" <tjac...@gmail.com>
Subject Re: Can't render html entities when adding documents
Date Wed, 20 Jun 2007 06:06:55 GMT
Replying to my own post, I just tried with solr 1.2 with the last 2
previous versions of acts_as_solr and it worked great, so I'm pretty
sure this is a solr-ruby issue. I'll do some more testing with the way
solr-ruby adds documents to Solr.

--
Thiago Jackiw
acts_as_solr => http://acts-as-solr.railsfreaks.com


On 6/19/07, Thiago Jackiw <tjackiw@gmail.com> wrote:
> What's interesting is that on the previous versions of acts_as_solr
> (without solr-ruby) the html entities where getting indexed fine
> without passing through ERB's html_escape method. That's that I did as
> a fast fix before starting this thread.
>
> Did anything change in Solr 1.2 in regards to xml parsing? And I guess
> I should try the previous version of the acts_as_solr plugin with Solr
> 1.2 to see if I get the same error.
>
> --
> Thiago Jackiw
> acts_as_solr => http://acts-as-solr.railsfreaks.com
>
>
> On 6/19/07, Aaron Suggs <aaron@ktheory.com> wrote:
> > I'm was getting the same XmlPullParserException from solr while using
> > solr-ruby to index HTML.
> >
> > I solved things by running text through the html_escape() method in
> > ERB::Utils before submitting to Solr.
> >
> > In the console, the following generates the XmlPullParserException in
> > solr, which manifests itself as a Net::HTTPFatalError in solr-ruby:
> >
> >   Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
> > :on).add(:id => 1, :value_t => '&nbsp;')
> > Net::HTTPFatalError: 500...XmlPullParserException...
> >
> > But escape_html (aliased as the h() method by default) characters
> > works like a charm:
> >
> >   include ERB::Util
> >   Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
> > :on).add(:id => 1, :value_t => h('&nbsp;'))
> > => true
> >
> > Subsequently, searching for strings like 'nbsp' returns hits on those
> > escaped entities, which may or may not be what you want:
> > >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query('value_t:nbsp').hits
> > => [{"score"=>10.771498, "id"=>1, "value_t"=>"&nbsp;"}]
> >
> > If you don't want searches for 'nbsp' to return all documents with
> > escaped non-breaking spaces, the solution lies in defining some new
> > fieldtype in solr/conf/schema.xml
> >
> > -Aaron Suggs
> >
> > On 6/19/07, Yonik Seeley <yonik@apache.org> wrote:
> > > On 6/19/07, Thiago Jackiw <tjackiw@gmail.com> wrote:
> > > > There's something funky with solr-ruby's xml processing when adding
> > > > documents, but I don't really know what it is yet. It can't process
> > > > html entities at all, not even an html blank space "&nbsp;":
> > >
> > > nbsp is not a default XML entity.
> > > Try replacing it with &#160;
> > >
> > > -Yonik
> > >
> >
>

Mime
View raw message