Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 25766 invoked from network); 14 Nov 2002 20:39:08 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 14 Nov 2002 20:39:08 -0000 Received: (qmail 26370 invoked by uid 97); 14 Nov 2002 20:40:07 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 26331 invoked by uid 97); 14 Nov 2002 20:40:06 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 26306 invoked by uid 98); 14 Nov 2002 20:40:05 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Message-ID: <3DD409E6.EEB4D207@michaels.com> Date: Thu, 14 Nov 2002 14:39:03 -0600 From: Craig Walls Organization: Michaels Stores Inc. X-Mailer: Mozilla 4.61 [en] (WinNT; I) X-Accept-Language: en MIME-Version: 1.0 To: Lucene Users List Subject: Re: HTML Analyzer? References: <13D2388EC2C4F04EB343EA2674BC20F22A0395@kc1exusr01.mail.dsionline.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: by AMaViS perl-11 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Ironically, I just had to solve this exact problem just 10 minutes ago... Check into javax.swing.text.html.HTMLEditorKit and javax.swing.text.html.HTMLDocument. Here's a URL that I found helpful (the site is Japanese, but the source code is still Java): http://java-house.jp/ml/archive/j-h-b/037727.html?#_body "Lichty, Kent" wrote: > We have a web application that builds pages "on the fly" by reading directly > from a database. The database contains both normal content and HTML. We use > Lucene as our search engine, but I need to figure out how to cause it to NOT > include content that is within HTML tags. I assume that this entails the > creation of a custom Analyzer. Are there any existing Analyzers already out > there that work like this? Thanks! > > ---------- Internet E-mail Confidentiality Disclaimer ---------- > > PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message. If > you are not the addressee indicated in this message or the employee or agent > responsible for delivering it to the addressee, you are hereby on notice > that you are in possession of confidential and privileged information. Any > dissemination, distribution, or copying of this e-mail is strictly > prohibited. In such case, you should destroy this message and kindly notify > the sender by reply e-mail. Please advise immediately if you or your > employer do not consent to Internet email for messages of this kind. > > Opinions, conclusions, and other information in this message that do not > relate to the official business of my firm shall be understood as neither > given nor endorsed by it. > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: