lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From atawfik <contact.txl...@gmail.com>
Subject Re: KeywordAnalyzer still getting tokenized on spaces
Date Tue, 09 Sep 2014 07:36:56 GMT
The result of QueryParser is confusing. The problem is that you assume the
query parser uses the analyzer to parse your query. However, that is not the
case. The query parser first parses the query string, then applies the
analyzer.

In other words, the query parser will split the query string using spaces.
So, you will get three terms : 1023, 4567 and 8765. In fact, you can see
that in the output of the second query; you have three boolean clauses
instead of one. After parsing query, the query parser applies the analyzer. 

To fix that, you have two solutions: 

1- Use term query instead directly without using query parser. In this case,
you will not apply the analyzer.
     Query currQuery = new TermQuery(new Term("sn",currQueryStr));
2- Analyze the query, then create the Term query:
      TokenStream ts = theAnalyzer.tokenStream("sn",new
StringReader(currQueryStr));
      ts.reset();
      ts.incrementToken();
     CharTermAttribute ca = ts.getAttribute(CharTermAttribute.class);
     String query = ca.toString();
     ts.close();
     Query currQuery = new TermQuery(new Term("sn",query));
     System.out.println(currQuery.getClass() + ", " + currQuery);

I am not aware of any method that uses QueryParser to achieve that. May
someone here can correct me.

Regards
Ameer



--
View this message in context: http://lucene.472066.n3.nabble.com/KeywordAnalyzer-still-getting-tokenized-on-spaces-tp4157537p4157560.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message