lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From joanne.spros...@teamware.co.uk
Subject Re(2): Re: [Lucene-dev] Katakana characters in q ueries (a bug?)
Date Fri, 26 Oct 2001 16:42:00 GMT
I'm not sure if this is the answer you are lookin for - but I overcome a
similar problem for Finnish characters by modifying the queryparser.jj file to
contain the following lines :

/* ***************** */
/* Token Definitions */
/* ***************** */

<*> TOKEN : {
  <#_ALPHA_CHAR: ["a"-"z", "A"-"Z", "ä", "ö", "Ä", "Ö", "å", "Å"] >
| <#_NUM_CHAR:   ["0"-"9"] >
| <#_ALPHANUM_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "ä", "ö", "Ä", "Ö", "å", "Å" ]
>
| <#_IDENTIFIER_CHAR: [ "a"-"z", "A"-"Z", "0"-"9", "_", "ä", "ö", "Ä", "Ö", "å",
"Å" ] >
| <#_IDENTIFIER: <_ALPHA_CHAR> (<_IDENTIFIER_CHAR>)* >
| <#_NEWLINE:    ( "\r\n" | "\r" | "\n" ) >
| <#_WHITESPACE: ( " " | "\t" ) >
| <#_QCHAR:      ( "\\" (<_NEWLINE> | ~["a"-"z", "A"-"Z", "0"-"9", "ä", "ö",
"Ä", "Ö", "å", "Å"] ) ) >
| <#_RESTOFLINE: (~["\r", "\n"])* >
}

<DEFAULT> TOKEN : {
  <AND:       ("AND" | "&&" | "and") >
| <OR:        ("OR" | "||" | "or") >
| <NOT:       ("NOT" | "!" | "not") >
| <PLUS:      "+" >
| <MINUS:     "-" >
| <LPAREN:    "(" >
| <RPAREN:    ")" >
| <COLON:     ":" >
| <CARAT:     "^" >
| <STAR:      "*" >
| <QUOTED:     "\"" (~["\""])+ "\"">
| <NUMBER:    (<_NUM_CHAR>)+ "." (<_NUM_CHAR>)+ >
| <TERM:      <_IDENTIFIER_CHAR>
              ( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "*" ] )* >
}

<DEFAULT> SKIP : {
  <<_WHITESPACE>>
}

<DEFAULT> TOKEN : {
<ALL:       (~[]) >
}





Doug Cutting  (22/10/2001  16:39):
>Brian,
>
>Do you know what's going on here?  I have not yet had time to look at this.
>If you don't have time, and no one else volunteers, then I will look into
>it.  I would like fix this for the 1.2 final release, if the change required
>is not major.
>
>Doug
>
>> -----Original Message-----
>> From: Ralf.Zimmermann@cit.de [mailto:Ralf.Zimmermann@cit.de]
>> Sent: Monday, October 22, 2001 8:56 AM
>> To: lucene-dev@jakarta.apache.org
>> Subject: Re: Re: [Lucene-dev] Katakana characters in queries (a bug?)
>>
>>
>>
>>
>> Hi,
>>
>> yes, I can confirm this bug. I have the same problem
>> with query terms starting with german umlauts like 'ä', 'ö'
>> and 'ü':
>>
>> Exception occurred during event dispatching:
>> org.apache.lucene.queryParser.TokenMgrError: Lexical error at
>> line 1, column 1.
>> Encountered: "\u00f6" (246), after : ""
>>      at
>> org.apache.lucene.queryParser.QueryParserTokenManager.getNextT
>> oken(Unknown
>> Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>>      at
>> org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
>>      ...
>>
>> The problem occurres in lucene 1.2 RC1 and RC2.
>>
>> Regards,
>> Ralf Zimmermann
>>
>>

------------------------------------------------------------------------

Joanne Sproston | Software Engineer
Teamware Group
joanne.sproston@teamware.co.uk
phone: +44 (0)1782 794879  fax: +44 (0)1782  776667

intra / extra / Internet solutions at www.teamware.com

Mime
View raw message