lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Hatcher" <li...@ehatchersolutions.com>
Subject Re: Indexing HTML with Lucene
Date Tue, 05 Mar 2002 14:37:47 GMT
You have to do it yourself, at at least find code that does this.  The
Lucene sample code has an HTML parser, and I've posted (to lucene-dev) an
alternative way of using JTidy to do this.

    Erik

----- Original Message -----
From: "Melissa Mifsud" <melissamifsud@yahoo.com>
To: "Lucene User" <lucene-user@jakarta.apache.org>
Sent: Tuesday, March 05, 2002 9:14 AM
Subject: Indexing HTML with Lucene


Hi,

Is it necessary to strip the HTML tags from HTML documents BEFORE telling
Lucene to index them? Does Lucene do this or will it index the tags too?!

Melissa



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message