lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: HTML tags and Lucene highlighting
Date Thu, 05 Apr 2012 19:44:47 GMT

A field configured to use HTMLStripCharFilter as part of its index-time analyzer will strip
out HTML tags before index terms are created by the tokenizer, so HTML tags will not be put
into the index.  As a result, queries for HTML tags cannot match the original documents' HTML
tags (in the field configured to use HTMLStripCharFilter, anyway).

So HTMLStripCharFilter should do what you want.


From: okayndc []
Sent: Thursday, April 05, 2012 3:36 PM
To: Steven A Rowe
Subject: Re: HTML tags and Lucene highlighting


I want to ignore HTML tags within a search.  ~ I should not be able to search for a HTML tag
(ex. <strong>) and get back the highlighted HTML tag (ex. <span class="highlighted"><strong></span>)
in a result set.


On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <<>>
Hi okayndc,

What *do* you want?


-----Original Message-----
From: okayndc [<>]
Sent: Thursday, April 05, 2012 1:34 PM
Subject: HTML tags and Lucene highlighting


I currently use Lucene version 3.0...probably need to upgrade to a more current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
<strong>), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want.  Is there a way to "filter" HTML tags?
I have read up on HTMLStripChar filter (packaged with Solr) and wondered if this is the way
to go?

Any help will be greatly appreciated,
To unsubscribe, e-mail:<>
For additional commands, e-mail:<>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message