lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Ho <sebasti...@bii.a-star.edu.sg>
Subject clean up html before indexing or add tags to ignore list
Date Thu, 13 May 2004 01:25:42 GMT
Hi

This is a typical web crawler, indexing and search application
development. I have wrote my crawler and planning to add lucene in next.
One questions pop to my mind, in terms of performance, do i clean up the
html removing all tags before indexing, or i add all tags into the
ignore list during indexing/search stage. 

Which is better?

Thanks

Sebastian Ho


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message