lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 李晓峰 <enderwiggin.ri...@gmail.com>
Subject Re: StopWords problem
Date Wed, 26 Dec 2007 21:16:50 GMT
It's the notepad.
It adds byte-order-mark(BOM, in this case 65279, or 0xfeff.) in front of 
your file, which javac does not recognize for reasons not quite clear to me.
here is the bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
it won't be fixed, so try to eliminate BOM before compile your code.

Liaqat Ali wrote:
> 李晓峰 wrote:
>> "javac" has an option "-encoding", which tells the compiler the 
>> encoding the input source file is using, this will probably solve the 
>> problem.
>> or you can try the unicode escape: \uxxxx, then you can save it in 
>> ANSI, had for human to read though.
>> or use an IDE, eclipse is a good choice, you can set the source file 
>> encoding, and it will take care of the compiler for you.
>>
>> regards.
>>> Hi, Doro Cohen
>>>
>>> Thanks for your reply, but I am facing a small problem over here. As 
>>> I am using notepad for coding, then in which format the file should 
>>> be saved.
>>>
>>>
>>> public static final String[] URDU_STOP_WORDS = { "کے" ,"کی" ,"سے" 
>>> ,"کا" ,"کو" ,"ہے" };
>>>
>>> Analyzer analyzer = new StandardAnalyzer(URDU_STOP_WORDS);
>>>
>>>
>>> If I save it in ANSI format it will lose the contents, I tried 
>>> Unicode but it does not work and I also tried UTF-8, but it also 
>>> generate two errors of identifying two illegal characters. What 
>>> should be the solution. Kindly guide in this.
>>>
>>> Thanks ..
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> Hi,
> Thanks alot for your suggestion.
> Using javac -encoding UTF-8 still raises the following error.
>
> urduIndexer.java : illegal character: \65279
> ?
> ^
> 1 error
>
> What I am doing wrong?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message