lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Lawrance <lawra...@cs.orst.edu>
Subject Re: CJK Support for HTMLParser.jj
Date Tue, 07 Sep 2004 21:43:27 GMT
I got the same warning when I compiled the patch. I haven't tried my 
patch with the patch for Bug 30844 (or the latest CVS) to see if it 
removes the warning. I assume that would fix the problem, but I haven't 
tested that theory out. I'll get around to that after I finish my 
current work (which uses Lucene to index Japanese documents) under a 
looming deadline. :-)

Joey

On Tuesday, September 7, 2004, at 01:19  PM, Daniel Naber wrote:

> On Monday 23 August 2004 13:46, Joey Lawrance wrote:
>
>> I've attached the HTMLParser.jj file that successfully parses Japanese
>> HTML for indexing.
>
> Joey,
>
> thanks for the patch. When I compile it with "ant javacc-HTMLParser" I 
> get
> this warning:
>
> "Warning: Line 364, Column 3: Non-ASCII characters used in regular
> expression.
> Please make sure you use the correct Reader when you create the parser 
> that
> can handle your character set."
>
> Is it okay to get this warning? The line the warning refers to is this 
> one:
>
> | < CJK:                                          // non-alphabets
>
> Besides that, the patch seems to work, i.e. the parser doesn't stop on
> Japanese HTML files anymore, but that's all I can say, as I don't speak
> Japanese.
>
> Regards
>  Daniel
>
> -- 
> http://www.danielnaber.de


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message