lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Basem Narmok <nar...@gmail.com>
Subject Re: Arabic Analyzer: possible bug
Date Thu, 08 Oct 2009 20:19:25 GMT
Uwe,
!00% correct

On Thu, Oct 8, 2009 at 4:56 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> I think the idea of lowercase filter in the arabic analyzers is not to
> really index mixed language texts. It is more for the case, if you have some
> word between the Arabic content (like product names,.), which happens often.
> You see this often also in Japanese texts. And for these embedded English
> fragments you really need no stop word list. And if there is a stop word in
> it, for the target language it is not a real stop word, it may be additional
> information. Stop word removal is done mostly because of they are needless
> (appear in every text). But if you have one Arabic sentence where "the" also
> appears next to an English word, it is more important than all the "the" in
> this mail.
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message