lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <john.by...@propylon.com>
Subject Re: bigram analysis
Date Mon, 03 Mar 2008 17:18:46 GMT
Yes, this makes sense to me. I think I'll just keep all words, including 
stop words, and if performance ever becomes an issue, I'll look at 
bigrams again. But I think there's a good chance that I'll never see 
significant impact either way.

Thanks guys!

Grant Ingersoll wrote:
> Yep, still good reasons like I said, but becoming less important as 
> the hardware, etc. gets faster and cheaper, IMO, especially in the 
> context of more advanced search capabilities.
>
> On Mar 3, 2008, at 10:49 AM, Mathieu Lecarme wrote:
>
>>
>>> Not sure, you might want to ask on Nutch.  From a strict language 
>>> standpoint, the notion of a stopword in my mind is a bit dubious.  
>>> If the word really has no meaning, then why does the language have 
>>> it to begin with?  In a search context, it has been treated as of 
>>> minimal use in the early days mostly because of space and memory 
>>> considerations.  Now a days, as we get more sophisticated in our 
>>> search capabilities, I think it can be useful for doing better 
>>> phrase matching, etc. as well as more advanced NLP search.  Now it 
>>> seems like the general response is disk is cheap, why throw away 
>>> information?
>> To limit writing on disk, to simplify merge ?
>>
>> I don't know the ratio of stop word in current texts.
>>
>> M.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message