lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konrad Kolosowski" <konr...@ca.ibm.com>
Subject Re: HTMLParser.jj
Date Wed, 23 Apr 2003 17:08:28 GMT
Just adding an option
  UNICODE_INPUT = true;
to HTMLParser.jj, recompiling, and ensuring that HTMLParser(java.io.Reader)
constructor is used elsewhere in the code should fix it.

Konrad Kolosowski



                                                                                         
                                             
                      mchaput                                                            
                                             
                      <mchaput@aw.sgi.c        To:       Lucene Developers List <lucene-dev@jakarta.apache.org>
                       
                      om>                      cc:                                    
                                                
                                               Subject:  HTMLParser.jj                   
                                             
                      04/22/2003 12:52                                                   
                                             
                      PM                                                                 
                                             
                      Please respond to                                                  
                                             
                      "Lucene                                                            
                                             
                      Developers List"                                                   
                                             
                                                                                         
                                             
                                                                                         
                                             



The demo HTMLParser chokes on unicode in attribute values. Anyone have
ideas on how to go about patching it?

My naive first try was to add Unicode ranges to the LET token, but I
just got "broken pipe" on every file.

Thanks!

Matt


--
                       |
Matt Chaput           |   A l i a s | W a v e f r o n t
Information Designer  |   210 King St. E. Toronto, ON, Canada M5A 1J7
mchaput@aw.sgi.com    |   (416) 874-8268
                       |
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message