lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: multilingual list of stopwords
Date Thu, 18 Oct 2007 12:52:14 GMT
Are you sure they don't just mean they want separate stopword lists  
for various different indexes in different languages?  Otherwise, I  
agree, it doesn't make much sense for a single mixed language index  
(unless you had an intelligent filter that could select based on  
language.)

Maria, perhaps you have specific languages you are looking for?  I  
would just Google for <Language> stopword list and see what comes  
up.  There are a lot of multilingual resources out there.

-Grant

On Oct 18, 2007, at 7:16 AM, Andrzej Bialecki wrote:

> Lukas Vlcek wrote:
>> Hi,
>> I haven't heard of multilingual stop words list before. What  
>> should be the
>> purpose of it? This seems to odd to me :-)
>
> That's because multilingual stopword list doesn't make sense ;)
>
> One example that I'm familiar with: words "is" and "by" in English  
> and in Swedish. Both words are stopwords in English, but they are  
> content words in Swedish (ice and village, respectively).  
> Similarly, "till" in Swedish is a stopword (to, towards), but it's  
> a content word in English.
>
> So, as Lukas correctly suggested, you should first perform language  
> identification, and then apply the correct stopword list.
>
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http:// 
www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



Mime
View raw message