lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <>
Subject [jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
Date Fri, 23 Feb 2007 20:10:05 GMT


Michael Busch commented on LUCENE-800:

Hi Dilip,

the backslash is the escape character in Lucene's queryparser syntax. So if you want to search
for a backslash you have to escape it. That means that the first two examples you provides
are working as expected:

item:\\ -> item:\ is correct
item:\\* -> item:\* is correct too

If you want to search for two backslashes you have to escape both, meaning you have to put
four backslashes in the query string:
item:\\\\* -> item:\\*

But you indeed found two other problems. You are right, the last example should not throw
a ParseException. 
In (item:\\ item:ABCD\\) the queryparser falsely thinks that the closing parenthesis is escaped,
but actually the backslash is the escaped character. I will provide a patch for this problem

And as you said the third example should throw a ParseException because there are too many
closing parenthesis. There is already a patch for this problem in JIRA:

I will commit fixes for both problems soon. 

Thanks again, Dilip! Good catches :-)

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats
one backslash.)
> ----------------------------------------------------------------------------------------------------
>                 Key: LUCENE-800
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would
index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes,
where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher
will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query
will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException
because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not
have, but it did.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message