lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: bigram analysis
Date Mon, 03 Mar 2008 17:11:49 GMT
Yep, still good reasons like I said, but becoming less important as  
the hardware, etc. gets faster and cheaper, IMO, especially in the  
context of more advanced search capabilities.

On Mar 3, 2008, at 10:49 AM, Mathieu Lecarme wrote:

>
>> Not sure, you might want to ask on Nutch.  From a strict language  
>> standpoint, the notion of a stopword in my mind is a bit dubious.   
>> If the word really has no meaning, then why does the language have  
>> it to begin with?  In a search context, it has been treated as of  
>> minimal use in the early days mostly because of space and memory  
>> considerations.  Now a days, as we get more sophisticated in our  
>> search capabilities, I think it can be useful for doing better  
>> phrase matching, etc. as well as more advanced NLP search.  Now it  
>> seems like the general response is disk is cheap, why throw away  
>> information?
> To limit writing on disk, to simplify merge ?
>
> I don't know the ratio of stop word in current texts.
>
> M.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message