lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Walls <>
Subject Re: HTML Analyzer?
Date Thu, 14 Nov 2002 20:39:03 GMT

Ironically, I just had to solve this exact problem just 10 minutes ago...

Check into javax.swing.text.html.HTMLEditorKit and
javax.swing.text.html.HTMLDocument. Here's a URL that I found helpful (the site
is Japanese, but the source code is still Java):

"Lichty, Kent" wrote:

> We have a web application that builds pages "on the fly" by reading directly
> from a database. The database contains both normal content and HTML.  We use
> Lucene as our search engine, but I need to figure out how to cause it to NOT
> include content that is within HTML tags. I assume that this entails the
> creation of a custom Analyzer.  Are there any existing Analyzers already out
> there that work like this? Thanks!
> ----------  Internet E-mail Confidentiality Disclaimer  ----------
> PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message.  If
> you are not the addressee indicated in this message or the employee or agent
> responsible for delivering it to the addressee, you are hereby on notice
> that you are in possession of confidential and privileged information.  Any
> dissemination, distribution, or copying of this e-mail is strictly
> prohibited.  In such case, you should destroy this message and kindly notify
> the sender by reply e-mail.  Please advise immediately if you or your
> employer do not consent to Internet email for messages of this kind.
> Opinions, conclusions, and other information in this message that do not
> relate to the official business of my firm shall be understood as neither
> given nor endorsed by it.
> --
> To unsubscribe, e-mail:   <>
> For additional commands, e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message