lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Bridonneau <EBridonn...@epicentric.com>
Subject Html documents parsing
Date Tue, 20 Nov 2001 00:14:43 GMT
I am confused about how Lucene performs the parsing of an Html document. It
doesn't do any tag striping (or does it?) consequently does that mean it
also indexes all html tags? If so then a request for searching "body" will
return any and all html documents previously indexed.
I'd appreciate anyone would could shed some light on the FAQ.10 about
indexing?


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message