> Not sure, you might want to ask on Nutch. From a strict language
> standpoint, the notion of a stopword in my mind is a bit dubious. If
> the word really has no meaning, then why does the language have it to
> begin with? In a search context, it has been treated as of minimal
> use in the early days mostly because of space and memory
> considerations. Now a days, as we get more sophisticated in our
> search capabilities, I think it can be useful for doing better phrase
> matching, etc. as well as more advanced NLP search. Now it seems like
> the general response is disk is cheap, why throw away information?
To limit writing on disk, to simplify merge ?
I don't know the ratio of stop word in current texts.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|