lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <john.by...@propylon.com>
Subject Re: strange issues with IRISH
Date Mon, 13 Jul 2009 18:46:21 GMT
Hi,

"suspect that [an] is still ignored as a stop word for some reason"

Yes, "an" is still a stop word in English of course! (eg. 'an apple')

Your custom analyzer should work; are you making sure to do both your 
indexing *and* your searching with the new analyzer?

I think making a list of Irish stop words could be tricky, since "an" 
sometimes means "the", but sometimes forms part of a verb (eg. "an 
bhfuil...?")

The safest bet is probably not to bother removing stop words. These days 
it doesn't really affect performance much,storage space is generally not 
much of an issue, and it makes phrase searching more accurate if you 
keep them.

-John
> Hi All,
>
>  
>
> I've came across very strange issue with Irish language.
>
> I have the following set of strings in Irish:
>
>  
>
> ag an gcrosbhealach seo, 
>
> Lean ar an mуrbhealach., 
>
> Lean an bуthar seo., 
>
> An bhfuil ... in am imeacht?, 
>
> An ... sin an t-am ceart?
>
>  
>
> And here is a search string: an
>
>  
>
> Search returns nothing instead of all of those phrases. I'm using simple
> analyzer but suspect that [an] is still ignored as a stop word for some
> reason.
>
> I've tried custom analyzer with the following code:
>
>  
>
> TokenStream ts = new WhitespaceTokenizer(reader);
>
> ts = new LowerCaseFilter(ts);
>
> return ts;
>
>  
>
> with no luck.
>
>  
>
> Any ideas?
>
>  
>
> Thanks.
>
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.387 / Virus Database: 270.13.12/2233 - Release Date: 07/12/09 08:20:00
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message