lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: Stripping html from text before indexing to solr
Date Thu, 09 Oct 2014 20:55:16 GMT
Hi Vishal,

Stripping html is not mandatory. Solr indexes it just like other text.

By the way, there are to places where you can strip html.
i) at analysis : char filter
ii) before analysis :  Update processor, html strip transformer

Ahmet


On Thursday, October 9, 2014 11:50 PM, Vishal Sharma <vishals@grazitti.com> wrote:
Is stripping html is always required before sending content to Solr or it
accepts html based data also?

If yes, in that scenario how does the match happen?

Looking for some best foolproof way of indexing html data to solr fields
where it would always be ready for match with query string





*Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
E: vishals@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
2014 *Meet
us at the Cloud Expo*
Booth N2341 Moscone North,
San Francisco
Schedule a Meeting
<http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
   |   Follow us <https://twitter.com/grazitti>ZakCalendar
Dreamforce® Featured
App
<https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3> 

Mime
View raw message