lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Sharma <vish...@grazitti.com>
Subject Re: Stripping html from text before indexing to solr
Date Fri, 10 Oct 2014 17:21:43 GMT
Oh gotcha.

Thanks for that!

*Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
E: vishals@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
2014 *Meet
us at the Cloud Expo*
Booth N2341 Moscone North,
San Francisco
Schedule a Meeting
<http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
   |   Follow us <https://twitter.com/grazitti>ZakCalendar
Dreamforce® Featured
App
<https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>






On Thu, Oct 9, 2014 at 3:17 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Yes, your plain string queries will automatically match in index.
> This is always true.
>
>
> If you don't strip html, html tags are considered part of the document and
> would cause false matches.
> For example q=bold,code,class, etc.
>
>
>
> On Friday, October 10, 2014 12:35 AM, Vishal Sharma <vishals@grazitti.com>
> wrote:
> I think I dint get you completely. I am really sorry for asking this again.
> New to solr world :)
>
> Are you saying if I don't strip html my plain string queries will
> automatically match in index?
>
> *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> E: vishals@grazitti.com
> www.grazitti.com [image: Description: LinkedIn]
> <http://www.linkedin.com/company/grazitti-interactive>[image: Description:
> Twitter] <https://twitter.com/grazitti>[image: fbook]
> <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
> 2014 *Meet
> us at the Cloud Expo*
> Booth N2341 Moscone North,
> San Francisco
> Schedule a Meeting
> <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
>    |   Follow us <https://twitter.com/grazitti>ZakCalendar
> Dreamforce® Featured
> App
> <
> https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3
> >
>
>
>
>
>
>
>
>
>
> On Thu, Oct 9, 2014 at 2:05 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
> wrote:
>
> > It depends on you, if you strip html using a char filter, it won't match
> > htm tags.
> > But the original document, when requested using fl= parameter, will be
> > html.
> >
> > If you do not strip html at all, q=html will return all documents.
> >
> > Ahmet
> >
> >
> >
> > On Friday, October 10, 2014 12:01 AM, Vishal Sharma <
> vishals@grazitti.com>
> > wrote:
> > Ahmet,
> >
> > So if its not necessary to strip html. Are you saying that plain text
> query
> > strings will automatically match the html content indexed to solr?
> >
> > *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> > E: vishals@grazitti.com
> > www.grazitti.com [image: Description: LinkedIn]
> > <http://www.linkedin.com/company/grazitti-interactive>[image:
> Description:
> > Twitter] <https://twitter.com/grazitti>[image: fbook]
> > <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
> > 2014 *Meet
> > us at the Cloud Expo*
> > Booth N2341 Moscone North,
> > San Francisco
> > Schedule a Meeting
> > <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
> >    |   Follow us <https://twitter.com/grazitti>ZakCalendar
> > Dreamforce® Featured
> > App
> > <
> >
> https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3
> > >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 9, 2014 at 1:55 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
> > wrote:
> >
> > > Hi Vishal,
> > >
> > > Stripping html is not mandatory. Solr indexes it just like other text.
> > >
> > > By the way, there are to places where you can strip html.
> > > i) at analysis : char filter
> > > ii) before analysis :  Update processor, html strip transformer
> > >
> > > Ahmet
> > >
> > >
> > > On Thursday, October 9, 2014 11:50 PM, Vishal Sharma <
> > vishals@grazitti.com>
> > > wrote:
> > > Is stripping html is always required before sending content to Solr or
> it
> > > accepts html based data also?
> > >
> > > If yes, in that scenario how does the match happen?
> > >
> > > Looking for some best foolproof way of indexing html data to solr
> fields
> > > where it would always be ready for match with query string
> > >
> > >
> > >
> > >
> > >
> > > *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> > > E: vishals@grazitti.com
> > > www.grazitti.com [image: Description: LinkedIn]
> > > <http://www.linkedin.com/company/grazitti-interactive>[image:
> > Description:
> > > Twitter] <https://twitter.com/grazitti>[image: fbook]
> > > <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
> > > 2014 *Meet
> > > us at the Cloud Expo*
> > > Booth N2341 Moscone North,
> > > San Francisco
> > > Schedule a Meeting
> > > <
> http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
> > >    |   Follow us <https://twitter.com/grazitti>ZakCalendar
> > > Dreamforce® Featured
> > > App
> > > <
> > >
> >
> https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message