lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cool Coder <techcool.ku...@yahoo.com>
Subject Re: HTML analyzer
Date Sat, 27 Oct 2007 04:24:21 GMT
Thanks Ketin for your input. There is already build in HTML strip reader i.e. HTMLStripReader
in solr, which I am currently using to strip all HTML tags before creating index. This also
solved my earlier problem related to highlighter , which was highlighting HTML tags e.g. I
was searching for "net" and result was something http://sdjkkjsd.net and it got converted
to http://sjhdnjkshn.<b>net</b> by highlighter.
   
  -BR

Karl Wettin <karl.wettin@gmail.com> wrote:
  
25 okt 2007 kl. 20.18 skrev Cool Coder:

> Is there any analyzer that can be configured

All of them can be.

TokenFilter.html>

I suggest you take a look at the code of any of them, 
StandardAnalyzer for instance.



-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message