lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <>
Subject Re: bigram analysis
Date Mon, 03 Mar 2008 15:49:47 GMT

> Not sure, you might want to ask on Nutch.  From a strict language 
> standpoint, the notion of a stopword in my mind is a bit dubious.  If 
> the word really has no meaning, then why does the language have it to 
> begin with?  In a search context, it has been treated as of minimal 
> use in the early days mostly because of space and memory 
> considerations.  Now a days, as we get more sophisticated in our 
> search capabilities, I think it can be useful for doing better phrase 
> matching, etc. as well as more advanced NLP search.  Now it seems like 
> the general response is disk is cheap, why throw away information?
To limit writing on disk, to simplify merge ?

I don't know the ratio of stop word in current texts.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message