lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: how to remove the dash
Date Tue, 26 Jun 2012 03:25:26 GMT
Oopd... I was mistaken to suggest that "a simple term query" would invoke 
the field analyzer - it passes the literal text without invoking any field 
analyzer.

-- Jack Krupansky

-----Original Message----- 
From: Jack Krupansky
Sent: Monday, June 25, 2012 10:14 PM
To: java-user@lucene.apache.org
Subject: Re: how to remove the dash

Most query parsers will "parse" a leading hyphen as an operator, so it will
never get to the analyzer for any field. Whether white space is permitted
between the "-" operator and the following term is dependent on the specific
query parser, and not guaranteed.

So, "bebidas - agua" is parsed by the query parser the same as
"bebidas -agua", which is the "prohibit" operator. This is all as it should
be.

Generally, all operators, including "+", "-", parentheses, "AND", "OR, etc.
need to be escaped if you want them to be passed through to the field
analyzers. Operators embedded within terms do not need to be escaped, except
for parentheses.

So, if you want user input to be treated as raw English text, as opposed to
a "structured" query, be sure to filter or escape the user query text before
parsing it. Or, consider using a simple term query that does no query
"parsing", but does pass the term through the field analyzer for the desired
field type.

-- Jack Krupansky

-----Original Message----- 
From: listas@alphamatrix.org
Sent: Monday, June 25, 2012 4:12 PM
To: java-user@lucene.apache.org
Subject: Re: how to remove the dash

More information...
If I change
System.out.println("Query: " + query.toString("contents"));
to this:
System.out.println("Query: " + query.toString());
I get this result:
"Query: contents:bebidas -contents:agua"

As I already tried many diferent Analyzers and I always get the same
result maybe it's a problem on the query parser??


A Segunda, 25 de Junho de 2012 21:10:02 listas@alphamatrix.org
escreveu:
> You are right... i'am not geting the hyphen inside any token... but it
still
> used as "prohibit operator".
>
> This is my output:
> Test: bebidas - agua
> Query: bebidas -agua
> Tokens:
> 1: [bebidas:0->7:<ALPHANUM>]
> 2: [agua:10->14:<ALPHANUM>]
>
> Test is the original string.
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message