Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5C4517EE0 for ; Thu, 9 Oct 2014 20:55:47 +0000 (UTC) Received: (qmail 22938 invoked by uid 500); 9 Oct 2014 20:55:43 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 22872 invoked by uid 500); 9 Oct 2014 20:55:43 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 22846 invoked by uid 99); 9 Oct 2014 20:55:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2014 20:55:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of iorixxx@yahoo.com designates 98.138.91.138 as permitted sender) Received: from [98.138.91.138] (HELO nm8-vm3.bullet.mail.ne1.yahoo.com) (98.138.91.138) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2014 20:55:38 +0000 Received: from [98.138.100.118] by nm8.bullet.mail.ne1.yahoo.com with NNFMP; 09 Oct 2014 20:55:17 -0000 Received: from [98.138.86.157] by tm109.bullet.mail.ne1.yahoo.com with NNFMP; 09 Oct 2014 20:55:16 -0000 Received: from [127.0.0.1] by omp1015.mail.ne1.yahoo.com with NNFMP; 09 Oct 2014 20:55:16 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 974035.70933.bm@omp1015.mail.ne1.yahoo.com Received: (qmail 64228 invoked by uid 60001); 9 Oct 2014 20:55:16 -0000 X-YMail-OSG: bfdpK0gVM1lvnu.UWI525zQJw.49zEa8gBLAeuay81sycMo _JoslpRI8lOiSlXOrFmZXIFtpEj4IA1ZiX78JwV2vb3tYw7MB1849qSoLmo2 uYutH_NQJwq1wcVKliNi4PFD8izT8LEgAtbtGbSC4S6JQhl30BQ3ke9W0V1q mJuAT3qf3Hd24DkFXpd2KGe0KiAt1rU47LYbKBhXkc3gaAkXNnfZ4UAmGs7I UxCh4EchIgrqJcbiNDdxgqoxKPeaOOOztI34A_NNhugMh0eVS.4Z_G4VXWnE cI4OuXzTGAmJ80s3l2mfaZNvyhg3IXAaezuege93DgYPGe.uKWpjKYs4txii cqsN5ym8ZM5ptJsgUrF6tZe8cZA_UbOA_aDN5W_a6s.ayXJJ.71DKv.N0URT 404JbPJcA7mvTkCh4sKTW2nGelbyUBGGvg5vrMSgXQYhQopbNgsYEeo5mZEi 6iF2mvfK0BiWJPhcdEkWSxqRhUAOynmlPrEANcc1D_Pn_RFLDVzec6n0ERul 7eWyk_Oy7DNMP6eoQb_j8WvKYeuYKJJ9hMIzfY8JPSSJghYy83mMYDJmpcdf qqLWE65R6AfJfR0zUMIpHoXO6IpolhKdUPp7Z_h1xHcpmQxeihwxn1c1iiZA qLTQJXiqFXYQhkBxNhUDOHBJGRxaYHgr5kYuYP6B_53rMJ11ql8sHJfUXpry gIV6MwlgjBYjeNz.AeGxMmS3bgc7.TAQKqRg4rrOpHDKux1t9UpHDFB_g52M BKKY_DwJJkr6ZaRECMHlXLmqgpFNoTUWihvEh0zCIjCFAWhqweWJcySVzIIy 4eH1Ru7l_BwyVp_NFkqJ7SUqV_ncwxTc5iZR_I7cZVBE.qARLaVryQM1ZnRI U.X6vASY4tCyaRHk9pExQiBaeRZdqg2QI2kemUvbOVGprsxqwblx14pX2BLE I_vXGPl.3P1i3WM4oKHtFMstiQLnBgvDhNOOxW8NU5gp5XTndjU6mIUBypX2 55b0LxW7RGOKjJ38SghZKQ1uDJkpEkjwl2jC9Du3rm7QInvaKS5rfKPTSqSU 4._hpZoDwb4uF7GQE3bFzJHpYFR_AgJtVEBaZCf5LTslv10o- Received: from [78.167.24.70] by web124706.mail.ne1.yahoo.com via HTTP; Thu, 09 Oct 2014 13:55:16 PDT X-Rocket-MIMEInfo: 002.001,SGkgVmlzaGFsLAoKU3RyaXBwaW5nIGh0bWwgaXMgbm90IG1hbmRhdG9yeS4gU29sciBpbmRleGVzIGl0IGp1c3QgbGlrZSBvdGhlciB0ZXh0LgoKQnkgdGhlIHdheSwgdGhlcmUgYXJlIHRvIHBsYWNlcyB3aGVyZSB5b3UgY2FuIHN0cmlwIGh0bWwuCmkpIGF0IGFuYWx5c2lzIDogY2hhciBmaWx0ZXIKaWkpIGJlZm9yZSBhbmFseXNpcyA6ICBVcGRhdGUgcHJvY2Vzc29yLCBodG1sIHN0cmlwIHRyYW5zZm9ybWVyCgpBaG1ldAoKCk9uIFRodXJzZGF5LCBPY3RvYmVyIDksIDIwMTQgMTE6NTAgUE0sIFZpc2hhbCABMAEBAQE- X-Mailer: YahooMailWebService/0.8.203.696 References: Message-ID: <1412888116.57667.YahooMailNeo@web124706.mail.ne1.yahoo.com> Date: Thu, 9 Oct 2014 13:55:16 -0700 From: Ahmet Arslan Reply-To: Ahmet Arslan Subject: Re: Stripping html from text before indexing to solr To: "solr-user@lucene.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Vishal,=0A=0AStripping html is not mandatory. Solr indexes it just like = other text.=0A=0ABy the way, there are to places where you can strip html.= =0Ai) at analysis : char filter=0Aii) before analysis : Update processor, = html strip transformer=0A=0AAhmet=0A=0A=0AOn Thursday, October 9, 2014 11:5= 0 PM, Vishal Sharma wrote:=0AIs stripping html is al= ways required before sending content to Solr or it=0Aaccepts html based dat= a also?=0A=0AIf yes, in that scenario how does the match happen?=0A=0ALooki= ng for some best foolproof way of indexing html data to solr fields=0Awhere= it would always be ready for match with query string=0A=0A=0A=0A=0A=0A*Vis= hal Sharma**TL, Grazitti Interactive*T: +1 650=AD 641 1754=0AE: vishals@gra= zitti.com=0Awww.grazitti.com [image: Description: LinkedIn]=0A[image: Description:=0ATwitter] [image: fbook]=0A*dreamforce=AE*Oct 13-16,=0A2014 *Meet=0Aus at the Cloud E= xpo*=0ABooth N2341 Moscone North,=0ASan Francisco=0ASchedule a Meeting=0A=0A = | Follow us ZakCalendar=0ADreamforce=AE Fe= atured=0AApp=0A