lucene-ruby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Can't render html entities when adding documents
Date Wed, 20 Jun 2007 10:22:44 GMT
Thiago,

I'll have to look late this week/weekend if I get a chance then, but  
how did acts_as_solr create the XML passed to Solr?   I think you  
used my original hack for that communication which used REXML,  
right?   solr-ruby now supports both REXML and libxml2 - and I've  
found that libxml2 does things properly whereas REXML was screwing  
things up.

I suspect we can come up with a simple test case that shows where  
things are wacky.  If you can submit one of those I'll be glad to  
look into this as soon as I can (this weekend at the earliest).

	Erik


On Jun 20, 2007, at 2:06 AM, Thiago Jackiw wrote:

> Replying to my own post, I just tried with solr 1.2 with the last 2
> previous versions of acts_as_solr and it worked great, so I'm pretty
> sure this is a solr-ruby issue. I'll do some more testing with the way
> solr-ruby adds documents to Solr.
>
> --
> Thiago Jackiw
> acts_as_solr => http://acts-as-solr.railsfreaks.com
>
>
> On 6/19/07, Thiago Jackiw <tjackiw@gmail.com> wrote:
>> What's interesting is that on the previous versions of acts_as_solr
>> (without solr-ruby) the html entities where getting indexed fine
>> without passing through ERB's html_escape method. That's that I  
>> did as
>> a fast fix before starting this thread.
>>
>> Did anything change in Solr 1.2 in regards to xml parsing? And I  
>> guess
>> I should try the previous version of the acts_as_solr plugin with  
>> Solr
>> 1.2 to see if I get the same error.
>>
>> --
>> Thiago Jackiw
>> acts_as_solr => http://acts-as-solr.railsfreaks.com
>>
>>
>> On 6/19/07, Aaron Suggs <aaron@ktheory.com> wrote:
>> > I'm was getting the same XmlPullParserException from solr while  
>> using
>> > solr-ruby to index HTML.
>> >
>> > I solved things by running text through the html_escape() method in
>> > ERB::Utils before submitting to Solr.
>> >
>> > In the console, the following generates the  
>> XmlPullParserException in
>> > solr, which manifests itself as a Net::HTTPFatalError in solr-ruby:
>> >
>> >   Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
>> > :on).add(:id => 1, :value_t => '&nbsp;')
>> > Net::HTTPFatalError: 500...XmlPullParserException...
>> >
>> > But escape_html (aliased as the h() method by default) characters
>> > works like a charm:
>> >
>> >   include ERB::Util
>> >   Solr::Connection.new(http://localhost:8083/solr, :autocommit =>
>> > :on).add(:id => 1, :value_t => h('&nbsp;'))
>> > => true
>> >
>> > Subsequently, searching for strings like 'nbsp' returns hits on  
>> those
>> > escaped entities, which may or may not be what you want:
>> > >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query 
>> ('value_t:nbsp').hits
>> > => [{"score"=>10.771498, "id"=>1, "value_t"=>"&nbsp;"}]
>> >
>> > If you don't want searches for 'nbsp' to return all documents with
>> > escaped non-breaking spaces, the solution lies in defining some new
>> > fieldtype in solr/conf/schema.xml
>> >
>> > -Aaron Suggs
>> >
>> > On 6/19/07, Yonik Seeley <yonik@apache.org> wrote:
>> > > On 6/19/07, Thiago Jackiw <tjackiw@gmail.com> wrote:
>> > > > There's something funky with solr-ruby's xml processing when  
>> adding
>> > > > documents, but I don't really know what it is yet. It can't  
>> process
>> > > > html entities at all, not even an html blank space "&nbsp;":
>> > >
>> > > nbsp is not a default XML entity.
>> > > Try replacing it with &#160;
>> > >
>> > > -Yonik
>> > >
>> >
>>


Mime
View raw message