lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal <eyal.j...@gmail.com>
Subject RE: QueryParser handling of backslash characters
Date Wed, 20 Jul 2005 20:58:55 GMT
I think this should work:

(Written in C# originally - so someone please check if it compiles - I don't
have a java compiler here)

    private String discardEscapeChar(String input) 
    {
      char[] caSource = input.toCharArray();
      char[] caDest = new char[caSource.length];
      int j = 0;

      for (int i = 0; i < caSource.length; i++) 
      {
        if (caSource[i] == '\\')
        {
          if (caSource.length == ++i)
            break;
        }
        caDest[j++]=caSource[i];
      }
      return new String(caDest, 0, j);
    }
 

Regarding your UnitTest - It think it's wrong:

>      assertEquals("\\\\\\\\192.168.0.15\\\\public", 
> discardEscapeChar ("\\\\192.168.0.15\\\\public"));

It should be: assertEquals("\\\\192.168.0.15\\\\public", discardEscapeChar
("\\\\\\\\192.168.0.15\\\\public"));

I would also suggest to add the following:
String s="\\\\some.host.name\\dir+:+-!():^[]\{}~*?";
assertEquals(s,discardEscapeChar(escape(s)));

Eyal

> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
> Sent: Wednesday, July 20, 2005 22:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: QueryParser handling of backslash characters
> 
> 
> On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:
> 
> > Hi,
> >
> > I'm seeing some strange behavior in the way the QueryParser handles 
> > consecutive backslash characters.  I know that backslash is 
> the escape 
> > character in Lucene, and so I would expect "\\\\" to match 
> fields that 
> > have two consecutive backslashes, but this does not seem to be the 
> > case.
> >
> > The fields I'm searching are UNC paths, e.g. 
> "\\192.168.0.15\public".
> > The only way I can get my query to find the record containing that 
> > value is to type "FieldName:\\\192.168.0.15\\public" (three 
> slashes).
> > Why is the third backslash character not treated as an 
> escape?  Is it 
> > just that any backslash that is preceded by a backslash is 
> interpreted 
> > as a literal backslash character, regardless of whether the "escape"
> > backslash was itself escaped?
> >
> > I can code around this, but it seems inconsistent with the way that 
> > escape characters usually work.  Is this a bug, or is it 
> intentional, 
> > or am I missing something?
> 
> I've waited until I had a chance to experiment with this 
> before replying.  I say that this is a bug.  There is a 
> private method in QueryParser called discardEscapeChar (shown 
> below).  I copied it to a JUnit test case and gave it this assert:
> 
>      assertEquals("\\\\\\\\192.168.0.15\\\\public", 
> discardEscapeChar ("\\\\192.168.0.15\\\\public"));
> 
> This test fails with:
> 
>      Expected:\\\\192.168.0.15\\public
>      Actual  :\192.168.0.15\public
> 
> Which is wrong in my opinion.  (though my head hurts thinking 
> about metaescaping backslashes in Java code to make this a 
> proper test)
> 
> The bug is isolated to the discardEscapeChar() method where 
> it eats too many backslashes.  Could you have a shot at 
> tweaking that method to do the right thing and submit a patch?
> 
>    private String discardEscapeChar(String input) {
>      char[] caSource = input.toCharArray();
>      char[] caDest = new char[caSource.length];
>      int j = 0;
>      for (int i = 0; i < caSource.length; i++) {
>        if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] 
> == '\\')) {
>          caDest[j++]=caSource[i];
>        }
>      }
>      return new String(caDest, 0, j);
>    }
> 
> Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message