Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 31937 invoked from network); 23 Apr 2003 17:08:43 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 23 Apr 2003 17:08:43 -0000 Received: (qmail 13270 invoked by uid 97); 23 Apr 2003 17:10:43 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 13263 invoked from network); 23 Apr 2003 17:10:42 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 23 Apr 2003 17:10:42 -0000 Received: (qmail 31011 invoked by uid 500); 23 Apr 2003 17:08:30 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 30946 invoked from network); 23 Apr 2003 17:08:29 -0000 Received: from e1.ny.us.ibm.com (32.97.182.101) by daedalus.apache.org with SMTP; 23 Apr 2003 17:08:29 -0000 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.56.224.150]) by e1.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h3NH8WWn160598 for ; Wed, 23 Apr 2003 13:08:32 -0400 Received: from d25ml01.torolab.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay02.pok.ibm.com (8.12.8/NCO/VER6.5) with ESMTP id h3NH8UIA135080 for ; Wed, 23 Apr 2003 13:08:31 -0400 Subject: Re: HTMLParser.jj To: "Lucene Developers List" X-Mailer: Lotus Notes Release 5.0.11 July 24, 2002 Message-ID: From: "Konrad Kolosowski" Date: Wed, 23 Apr 2003 13:08:28 -0400 X-MIMETrack: Serialize by Router on D25ML01/25/M/IBM(Release 5.0.9a |January 7, 2002) at 04/23/2003 01:08:31 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Just adding an option UNICODE_INPUT = true; to HTMLParser.jj, recompiling, and ensuring that HTMLParser(java.io.Reader) constructor is used elsewhere in the code should fix it. Konrad Kolosowski mchaput om> cc: Subject: HTMLParser.jj 04/22/2003 12:52 PM Please respond to "Lucene Developers List" The demo HTMLParser chokes on unicode in attribute values. Anyone have ideas on how to go about patching it? My naive first try was to add Unicode ranges to the LET token, but I just got "broken pipe" on every file. Thanks! Matt -- | Matt Chaput | A l i a s | W a v e f r o n t Information Designer | 210 King St. E. Toronto, ON, Canada M5A 1J7 mchaput@aw.sgi.com | (416) 874-8268 | "A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org