lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: [OT] About stopwords
Date Thu, 27 Nov 2008 13:43:46 GMT

That's a phrase search, so it's conceivable google could be doing  
something similar to nutch, whereby adjacent ngrams are indexed as  
unique terms.

But if you do the same search without quotes:

     http://www.google.fr/search?hl=fr&q=HOW+at+at+of+a+A+a&btnG=Rechercher&meta=

they still find many matches (though, curiously the one result  
returned for the phrase search seems not to make the first page for  
the non-phrase search).

So it does seem like Google has no stop words.

It actually makes some sense, because Google obviously has to deal  
with non-stopword terms that have tremendous frequency (eg "1" and  
"2", which occur more frequently than "a" or "the") by scaling out  
across machines, so since they already solved that scaleout anyway,  
the added incremental cost of including stopwords is probably minor.

Mike

David Causse wrote:

> Hi,
>
> Look at this google query : http://www.google.fr/search?q=%22HOW+at+at+of+a+A+a%22
>
> What do you think about that concerning stop words?
> Google has no stop words?
>
> David.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message