lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: bigram analysis
Date Mon, 03 Mar 2008 15:49:47 GMT

> Not sure, you might want to ask on Nutch.  From a strict language 
> standpoint, the notion of a stopword in my mind is a bit dubious.  If 
> the word really has no meaning, then why does the language have it to 
> begin with?  In a search context, it has been treated as of minimal 
> use in the early days mostly because of space and memory 
> considerations.  Now a days, as we get more sophisticated in our 
> search capabilities, I think it can be useful for doing better phrase 
> matching, etc. as well as more advanced NLP search.  Now it seems like 
> the general response is disk is cheap, why throw away information?
To limit writing on disk, to simplify merge ?

I don't know the ratio of stop word in current texts.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message