lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From okayndc <bodymo...@gmail.com>
Subject Re: HTML tags and Lucene highlighting
Date Thu, 05 Apr 2012 20:34:44 GMT
I want to retain the formatted HTML in a result but, want to ignore (or
filter out) HTML tags in a search, if this makes sense?

On Thu, Apr 5, 2012 at 3:44 PM, Steven A Rowe <sarowe@syr.edu> wrote:

> okayndc,
>
> A field configured to use HTMLStripCharFilter as part of its index-time
> analyzer will strip out HTML tags before index terms are created by the
> tokenizer, so HTML tags will not be put into the index.  As a result,
> queries for HTML tags cannot match the original documents' HTML tags (in
> the field configured to use HTMLStripCharFilter, anyway).
>
> So HTMLStripCharFilter should do what you want.
>
> Steve
>
> From: okayndc [mailto:bodymoves@gmail.com]
> Sent: Thursday, April 05, 2012 3:36 PM
> To: Steven A Rowe
> Cc: java-user@lucene.apache.org
> Subject: Re: HTML tags and Lucene highlighting
>
> Hello,
>
> I want to ignore HTML tags within a search.  ~ I should not be able to
> search for a HTML tag (ex. <strong>) and get back the highlighted HTML tag
> (ex. <span class="highlighted"><strong></span>) in a result set.
>
> Thanks
>
> On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe <sarowe@syr.edu<mailto:
> sarowe@syr.edu>> wrote:
> Hi okayndc,
>
> What *do* you want?
>
> Steve
>
> -----Original Message-----
> From: okayndc [mailto:bodymoves@gmail.com<mailto:bodymoves@gmail.com>]
> Sent: Thursday, April 05, 2012 1:34 PM
> To: java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>
> Subject: HTML tags and Lucene highlighting
>
> Hello,
>
> I currently use Lucene version 3.0...probably need to upgrade to a more
> current version soon.
> The problem that I have is when I test search for a an HTML tag (ex.
> <strong>), Lucene returns
> the highlighted HTML tag ~ which is what I DO NOT want.  Is there a way to
> "filter" HTML tags?
> I have read up on HTMLStripChar filter (packaged with Solr) and wondered
> if this is the way to go?
>
> Any help will be greatly appreciated,
> Thanks
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<mailto:
> java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.org<mailto:
> java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message