lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Halácsy Péter <halacsy.pe...@axelero.com>
Subject RE: Antwort: RE: Re(2): Re: [Lucene-dev] Katakana characters in queries (a bug?)
Date Wed, 31 Oct 2001 09:38:31 GMT


> -----Original Message-----
> From: Ralf.Zimmermann@cit.de [mailto:Ralf.Zimmermann@cit.de]
> 
> Hi again,
> 
> i think i have to correct my previous statement. it seems, 
> that the new
> token definitions introduce other problems. The "+" od "-" prefixes to
> search terms do not work any longer.
> 
> Example:
> The queries 'auch hilfe', '+auch +hilfe' and '+auch -hilfe' 
> are returning
> the same results.
> 
> Ralf Zimmermann
> 
Yes, you are right: +auch is a term now and not a PLUS modifier and a
term "auch".

Two possible solutions:
1. if a term can't include any plus and minus sign simply modify the
_TERM_CHAR definition:
<#_TERM_CHAR: ~["\"", " ", "\t", "(", ")", ":", "&", "|",
                  "^", "*", "?", "~", "{", "}", "[", "]" ] ,  "+", "-">

That means: no one of this characters can be in a term. I think it's
good solution if we define a term as vector of letters (but _TERM_CHAR
can contain not letter characters, for example $. The LowerCaseTokenizer
of Lucene will cut off $ sign)

2. if a term can't start with + or - but on other position these are
allowed:
add a new token:
<#_FIRST_TERM_CHAR: ~["\"", " ", "\t", "(", ")", ":", "&", "|",
                  "^", "*", "?", "~", "{", "}", "[", "]", "+", "-" ] >
and modify TERM and WILD_TERM
 <TERM:      <_FIRST_TERM_CHAR> (<_TERM_CHAR>)* >
...
| <WILDTERM:  <_FIRST_TERM_CHAR>
              ( ~["\"", " ", "\t", "(", ")", ":", "&", "|", "^", "~",
"{",
"}", "[", "]" ] )+ <_TERM_CHAR>>

I hope one of these help,
peter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message