lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chad Small" <>
Subject RE: Query syntax on Keyword field question
Date Wed, 24 Mar 2004 13:39:33 GMT
Great info Morus,
After making the "escape the dash" change to the QueryParser:
Query query = QueryParser.parse("+category:HW\\-NCI_TOPICS AND SPACE",
      Hits hits =;
      System.out.println("query.ToString = " + query.toString("description"));
      assertEquals("HW-NCI_TOPICS kept as-is",
                   "+category:HW\\-NCI_TOPICS +space", query.toString("description"));  <------note
that this passes with the escape put in, so not "as-is".
      assertEquals("doc found!", 1, hits.length());
I'm still getting this output:
query.ToString = +category:HW\-NCI_TOPICS +space
junit.framework.AssertionFailedError: doc found! expected:<1> but was:<0>
It look like bug, <>
, was fixed today:
------- Additional Comments From Otis Gospodnetic <>  2004-03-24
10:10 -------

Although tft-monitor should not really result in a phrase query "tft monitor", I
agree that this is better than converting it to tft AND NOT monitor (tft -monitor).
Moreover, I have seen query syntax where '-' characters are used for phrase
queries instead or in addition to quotes, so one could use either morus-walter
or "morus walter".

I applied your change, as it doesn't look like it breaks anything, and I hope
nobody relied on ill behaviour where tft-monitor would result in AND NOT query.
But I assume this fix won't come out for some time.  Is there a way I can get this fix sooner?
I'm up against a deadline and would very much like this functionality. 
And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip
the escape:
    protected boolean isTokenChar(char c)
         if (c == '\\')
            return false;
            return true;
The test then returns with a space:
query.ToString = +category:"HW -NCI_TOPICS" +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:"HW -NCI_TOPICS" +space   <----note space where escape was.

	-----Original Message----- 
	From: Morus Walter [] 
	Sent: Wed 3/24/2004 1:43 AM 
	To: Lucene Users List 
	Subject: RE: Query syntax on Keyword field question

	Chad Small writes:
	> Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse the length
of the message, but wanted to give actual code.
	> With this output:
	> Analzying "HW-NCI_TOPICS"
	>  org.apache.lucene.analysis.WhitespaceAnalyzer:
	>  org.apache.lucene.analysis.SimpleAnalyzer:
	>   [hw] [nci] [topics]
	>  org.apache.lucene.analysis.StopAnalyzer:
	>   [hw] [nci] [topics]
	>  org.apache.lucene.analysis.standard.StandardAnalyzer:
	>   [hw] [nci] [topics]
	>  healthecare.domain.lucenesearch.KeywordAnalyzer:
	> query.ToString = category:HW -"nci topics" +space
	> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
	> Expected:+category:HW-NCI_TOPICS +space
	> Actual  :category:HW -"nci topics" +space
	Well query parser does not allow `-' within words currently.
	So before your analyzer is called, query parser reads one word HW, a `-'
	operator, one word NCI_TOPICS.
	The latter is analyzed as "nci topics" because it's not in field category
	anymore, I guess.
	I suggested to change this. See
	Either you escape the - using category:HW\-NCI_TOPICS in your query
	(untested. and I don't know where the escape character will be removed)
	or you apply my suggested change.
	Another option for using keywords with query parser might be adding a
	keyword syntax to the query parser.
	Something like category:key("HW-NCI_TOPICS") or category="HW-NCI_TOPICS".
	To unsubscribe, e-mail:
	For additional commands, e-mail:

View raw message