lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Pureza <pur...@gmail.com>
Subject Re: Lucene QueryParser/Analyzer inconsistency
Date Wed, 18 Jun 2014 14:38:19 GMT
Thanks, that did work.



On Tue, Jun 17, 2014 at 8:49 PM, Jack Krupansky <jack@basetechnology.com>
wrote:

> Yeah, this is kind of tricky and confusing! Here's what happens:
>
> 1. The query parser "parses" the input string into individual source
> terms, each delimited by white space. The escape is removed in this
> process, but... no analyzer has been called at this stage.
>
> 2. The query parser (generator) calls the analyzer for each source term.
> Your analyzer is called at this stage, but... the escape is already gone,
> so... the <backslash><slash> mapping rule is not triggered, leaving the
> slash recorded in the source term from step 1.
>
> You do need the backslash in your original query because a slash
> introduces a regex query term. It is added by the escape method you call,
> but the escaping will be gone by the time your analyzer is called.
>
> So, just try a simple, unescaped slash in your char mapping table.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Luis Pureza
> Sent: Tuesday, June 17, 2014 1:43 PM
> To: java-user@lucene.apache.org
> Subject: Lucene QueryParser/Analyzer inconsistency
>
>
> Hi,
>
> I'm experience a puzzling behaviour with the QueryParser and was hoping
> someone around here can help me.
>
> I have a very simple Analyzer that tries to replace forward slashes (/) by
> spaces. Because QueryParser forces me to escape strings with slashes before
> parsing, I added a MappingCharFilter to the analyzer that replaces "\/"
> with a single space. The analyzer is defined as follows:
>
> @Override
> protected TokenStreamComponents createComponents(String field, Reader in) {
>    NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
>    builder.add("\\/", " ");
>    Reader mappingFilter = new MappingCharFilter(builder.build(), in);
>
>    Tokenizer tokenizer = new WhitespaceTokenizer(version, mappingFilter);
>    return new TokenStreamComponents(tokenizer);
> }
>
> Then I use this analyzer in the QueryParser to parse a string with dashes:
>
> String text = QueryParser.escape("one/two");
> QueryParser parser = new QueryParser(Version.LUCENE_48, "f", new
> MyAnalyzer(Version.LUCENE_48));
> System.err.println(parser.parse(text));
>
> The expected output would be
>
> f:one f:two
>
> However, I get:
>
> f:one/two
>
> The puzzling thing is that when I debug the analyzer, it tokenizes the
> input string correctly, returning two tokens instead of one.
>
> What is going on?
>
> Many thanks,
>
> Luís Pureza
>
> P.S.: I was able to fix this issue temporarily by creating my own tokenizer
> that tokenizes on whitespace and slashes. However, I still don't understand
> what's going on.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message