lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liaqat Ali <liaqatalim...@gmail.com>
Subject Re: StopWords problem
Date Wed, 26 Dec 2007 22:08:12 GMT
Doron Cohen wrote:
> On Dec 26, 2007 10:33 PM, Liaqat Ali <liaqatalimian@gmail.com> wrote:
>
>   
>> Using javac -encoding UTF-8 still raises the following error.
>>
>> urduIndexer.java : illegal character: \65279
>> ?
>> ^
>> 1 error
>>
>> What I am doing wrong?
>>
>>     
>
> If you have the stop-words in a file, say one word in a line,
> they can be read like this:
>
>     BufferedReader r = new BufferedReader(new InputStreamReader(new
> FileInputStream("Urdu.txt"),"UTF8"));
>     String word = r.readLine();    // loop this line, you get the picture
>
> (Make sure to specify encoding "UTF8" when saving the file from notepad).
>
> Regards,
> Doron
>
>   
Hi, Doron

The compilation problem is solved, but there is no change in the index.


public static final String[] URDU_STOP_WORDS = { "کی" ,"کا" ,"کو" ,"ہے" 
,"کے" ,"نے" ,"پر" ,"اور" ,"سے","میں" ,"بھی"
,"ان" ,"ایک" ,"تھا" ,"تھی" ,"کیا" ,"ہیں" ,"کر" ,"وہ" ,"جس" ,"نہں"
,"تک" };
Analyzer analyzer = new StandardAnalyzer(URDU_STOP_WORDS);


Again these words are appeared in the index with high ranks.

Regards,
Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message