lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: Stripping html from text before indexing to solr
Date Thu, 09 Oct 2014 22:17:44 GMT
Yes, your plain string queries will automatically match in index.
This is always true.


If you don't strip html, html tags are considered part of the document and would cause false
matches.
For example q=bold,code,class, etc.



On Friday, October 10, 2014 12:35 AM, Vishal Sharma <vishals@grazitti.com> wrote:
I think I dint get you completely. I am really sorry for asking this again.
New to solr world :)

Are you saying if I don't strip html my plain string queries will
automatically match in index?

*Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
E: vishals@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
2014 *Meet
us at the Cloud Expo*
Booth N2341 Moscone North,
San Francisco
Schedule a Meeting
<http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
   |   Follow us <https://twitter.com/grazitti>ZakCalendar
Dreamforce® Featured
App
<https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>









On Thu, Oct 9, 2014 at 2:05 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> It depends on you, if you strip html using a char filter, it won't match
> htm tags.
> But the original document, when requested using fl= parameter, will be
> html.
>
> If you do not strip html at all, q=html will return all documents.
>
> Ahmet
>
>
>
> On Friday, October 10, 2014 12:01 AM, Vishal Sharma <vishals@grazitti.com>
> wrote:
> Ahmet,
>
> So if its not necessary to strip html. Are you saying that plain text query
> strings will automatically match the html content indexed to solr?
>
> *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> E: vishals@grazitti.com
> www.grazitti.com [image: Description: LinkedIn]
> <http://www.linkedin.com/company/grazitti-interactive>[image: Description:
> Twitter] <https://twitter.com/grazitti>[image: fbook]
> <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
> 2014 *Meet
> us at the Cloud Expo*
> Booth N2341 Moscone North,
> San Francisco
> Schedule a Meeting
> <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
>    |   Follow us <https://twitter.com/grazitti>ZakCalendar
> Dreamforce® Featured
> App
> <
> https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3
> >
>
>
>
>
>
>
>
>
>
> On Thu, Oct 9, 2014 at 1:55 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
> wrote:
>
> > Hi Vishal,
> >
> > Stripping html is not mandatory. Solr indexes it just like other text.
> >
> > By the way, there are to places where you can strip html.
> > i) at analysis : char filter
> > ii) before analysis :  Update processor, html strip transformer
> >
> > Ahmet
> >
> >
> > On Thursday, October 9, 2014 11:50 PM, Vishal Sharma <
> vishals@grazitti.com>
> > wrote:
> > Is stripping html is always required before sending content to Solr or it
> > accepts html based data also?
> >
> > If yes, in that scenario how does the match happen?
> >
> > Looking for some best foolproof way of indexing html data to solr fields
> > where it would always be ready for match with query string
> >
> >
> >
> >
> >
> > *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> > E: vishals@grazitti.com
> > www.grazitti.com [image: Description: LinkedIn]
> > <http://www.linkedin.com/company/grazitti-interactive>[image:
> Description:
> > Twitter] <https://twitter.com/grazitti>[image: fbook]
> > <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
> > 2014 *Meet
> > us at the Cloud Expo*
> > Booth N2341 Moscone North,
> > San Francisco
> > Schedule a Meeting
> > <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
> >    |   Follow us <https://twitter.com/grazitti>ZakCalendar
> > Dreamforce® Featured
> > App
> > <
> >
> https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3
> > >
> >
>

Mime
View raw message