lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Indexing HTML
Date Thu, 10 Jun 2010 21:16:35 GMT
Looking at it again, there appears to be only one HTML stripper. Your
alternative is to use the regex PatternReplace stuff with some custom
patterns. Ok make a stopword list of all html keywords.

On Thu, Jun 10, 2010 at 8:00 AM, Blargy <zmanods@hotmail.com> wrote:
>
> Do I even need to tidy/clean up the html if I use the
> HTMLStripCharFilterFactory?
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p885797.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message