lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: WhiteSpace Tokenizer question
Date Tue, 23 Aug 2005 16:31:02 GMT
It's the QueryParser, not the Analyzer.
When the query parser sees multiple tokens from what looks like a
single word, it puts them in a phrase query.

I think the only way to change that behavior would be to modify the
QueryParser.

-Yonik

On 8/23/05, Dan Armbrust <daniel.armbrust.list@gmail.com> wrote:
> I wrote a slightly modified version of the WhiteSpaceTokenizer that
> allows me to treat other characters as whitespace.  My thought was that
> this would be an easy way to make it tokenize on characters such as "-".
> 
> My tokenizer looks like this:
> 
> public class CustomWhiteSpaceTokenizer extends CharTokenizer
> {
> 
>     protected boolean isTokenChar(char c)
>     {
>         if (Character.isWhitespace(c) || whiteSpaceChars_.contains(new
> Character(c)))
>         {
>             return false;
>         }
>         else
>         {
>             return true;
>         }
>     }
> 
> <snip other stuff>
> }
> 
> When I use my Analyzer which uses this tokenizer in the QueryParser with
> the character "-" defined as whitespace, the following query gets parsed
> like this:
> 
> "title:(john  a) body:(john  a) " -> (title:john title:a) (body:john body:a)
> 
> which is what I expect.  But then the following query:
> 
> "title:(john--a) body:(john--a) " -> title:"john a" body:"john a"
> 
> Isn't what I want.  I can't seem to figure out why it is behaving
> differently on these characters (space vs hyphen) when I am specifying
> them both as a non-token.
> 
> This is with the svn trunk as of yesterday.
> Any help appreciated,
> 
> Thanks,
> 
> Dan
> 
> --
> ****************************
> Daniel Armbrust
> Biomedical Informatics
> Mayo Clinic Rochester
> daniel.armbrust(at)mayo.edu
> http://informatics.mayo.edu/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message