lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: QueryParser handling of backslash characters
Date Wed, 20 Jul 2005 19:37:45 GMT

On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:

> Hi,
>
> I'm seeing some strange behavior in the way the QueryParser handles
> consecutive backslash characters.  I know that backslash is the escape
> character in Lucene, and so I would expect "\\\\" to match fields that
> have two consecutive backslashes, but this does not seem to be the
> case.
>
> The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public".
> The only way I can get my query to find the record containing that
> value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
> Why is the third backslash character not treated as an escape?  Is it
> just that any backslash that is preceded by a backslash is interpreted
> as a literal backslash character, regardless of whether the "escape"
> backslash was itself escaped?
>
> I can code around this, but it seems inconsistent with the way that
> escape characters usually work.  Is this a bug, or is it intentional,
> or am I missing something?

I've waited until I had a chance to experiment with this before  
replying.  I say that this is a bug.  There is a private method in  
QueryParser called discardEscapeChar (shown below).  I copied it to a  
JUnit test case and gave it this assert:

     assertEquals("\\\\\\\\192.168.0.15\\\\public", discardEscapeChar 
("\\\\192.168.0.15\\\\public"));

This test fails with:

     Expected:\\\\192.168.0.15\\public
     Actual  :\192.168.0.15\public

Which is wrong in my opinion.  (though my head hurts thinking about  
metaescaping backslashes in Java code to make this a proper test)

The bug is isolated to the discardEscapeChar() method where it eats  
too many backslashes.  Could you have a shot at tweaking that method  
to do the right thing and submit a patch?

   private String discardEscapeChar(String input) {
     char[] caSource = input.toCharArray();
     char[] caDest = new char[caSource.length];
     int j = 0;
     for (int i = 0; i < caSource.length; i++) {
       if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] == '\\')) {
         caDest[j++]=caSource[i];
       }
     }
     return new String(caDest, 0, j);
   }

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message