lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Ho <>
Subject clean up html before indexing or add tags to ignore list
Date Thu, 13 May 2004 01:25:42 GMT

This is a typical web crawler, indexing and search application
development. I have wrote my crawler and planning to add lucene in next.
One questions pop to my mind, in terms of performance, do i clean up the
html removing all tags before indexing, or i add all tags into the
ignore list during indexing/search stage. 

Which is better?


Sebastian Ho

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message