lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: StopWords problem
Date Wed, 26 Dec 2007 21:41:53 GMT
On Dec 26, 2007 10:33 PM, Liaqat Ali <liaqatalimian@gmail.com> wrote:

> Using javac -encoding UTF-8 still raises the following error.
>
> urduIndexer.java : illegal character: \65279
> ?
> ^
> 1 error
>
> What I am doing wrong?
>

If you have the stop-words in a file, say one word in a line,
they can be read like this:

    BufferedReader r = new BufferedReader(new InputStreamReader(new
FileInputStream("Urdu.txt"),"UTF8"));
    String word = r.readLine();    // loop this line, you get the picture

(Make sure to specify encoding "UTF8" when saving the file from notepad).

Regards,
Doron

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message