lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mchaput <mcha...@aw.sgi.com>
Subject HTMLParser choking on Unicode
Date Tue, 08 Apr 2003 16:54:37 GMT
When I try to index Japanese HTML files using HTMLParser, I just get
"lexical errors" in every file:

   Parse Aborted: Lexical error at line 12, column 28.
   Encountered: "\u2030" (8240), after : ""

Is there something I have to do to make HTMLParser work with Unicode?

(I haven't done anything special with readers or encodings (don't really
know much about it)... is that the problem?)

Thanks,

Matt


-- 
                       |
Matt Chaput           |   A l i a s | W a v e f r o n t
Information Designer  |   210 King St. E. Toronto, ON, Canada M5A 1J7
mchaput@aw.sgi.com    |   (416) 874-8268
                       |
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message