lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chad Small" <Chad.Sm...@definityhealth.com>
Subject RE: Query syntax on Keyword field question
Date Wed, 24 Mar 2004 13:39:33 GMT
Great info Morus,
 
After making the "escape the dash" change to the QueryParser:
 
Query query = QueryParser.parse("+category:HW\\-NCI_TOPICS AND SPACE",
                                      "description",
                                      analyzer);
      Hits hits = searcher.search(query);
      System.out.println("query.ToString = " + query.toString("description"));
      assertEquals("HW-NCI_TOPICS kept as-is",
                   "+category:HW\\-NCI_TOPICS +space", query.toString("description"));  <------note
that this passes with the escape put in, so not "as-is".
      assertEquals("doc found!", 1, hits.length());
 
I'm still getting this output:
 
 domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
 
query.ToString = +category:HW\-NCI_TOPICS +space
 
junit.framework.AssertionFailedError: doc found! expected:<1> but was:<0>
 
It look like bug, http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 <http://issues.apache.org/bugzilla/show_bug.cgi?id=27491>
, was fixed today:
 
------- Additional Comments From Otis Gospodnetic <mailto:otis@apache.org>  2004-03-24
10:10 -------

Although tft-monitor should not really result in a phrase query "tft monitor", I
agree that this is better than converting it to tft AND NOT monitor (tft -monitor).
Moreover, I have seen query syntax where '-' characters are used for phrase
queries instead or in addition to quotes, so one could use either morus-walter
or "morus walter".

I applied your change, as it doesn't look like it breaks anything, and I hope
nobody relied on ill behaviour where tft-monitor would result in AND NOT query.
-----------
But I assume this fix won't come out for some time.  Is there a way I can get this fix sooner?
 
I'm up against a deadline and would very much like this functionality. 
 
And to go one more step with the KeywordAnalyzer that I wrote, changing this method to skip
the escape:
    protected boolean isTokenChar(char c)
    {
         if (c == '\\')
         {
            return false;
         }
         else
         {
            return true;
         }
      }
The test then returns with a space:
 healthecare.domain.lucenesearch.KeywordAnalyzer:
  [HW-NCI_TOPICS] 
query.ToString = +category:"HW -NCI_TOPICS" +space
junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
Expected:+category:HW\-NCI_TOPICS +space
Actual  :+category:"HW -NCI_TOPICS" +space   <----note space where escape was.
thanks,
chad.

	-----Original Message----- 
	From: Morus Walter [mailto:morus.walter@tanto.de] 
	Sent: Wed 3/24/2004 1:43 AM 
	To: Lucene Users List 
	Cc: 
	Subject: RE: Query syntax on Keyword field question
	
	

	Chad Small writes:
	> Here is my attempt at a KeywordAnalyzer - although is not working?  Excuse the length
of the message, but wanted to give actual code.
	> 
	> With this output:
	> 
	> Analzying "HW-NCI_TOPICS"
	>  org.apache.lucene.analysis.WhitespaceAnalyzer:
	>   [HW-NCI_TOPICS]
	>  org.apache.lucene.analysis.SimpleAnalyzer:
	>   [hw] [nci] [topics]
	>  org.apache.lucene.analysis.StopAnalyzer:
	>   [hw] [nci] [topics]
	>  org.apache.lucene.analysis.standard.StandardAnalyzer:
	>   [hw] [nci] [topics]
	>  healthecare.domain.lucenesearch.KeywordAnalyzer:
	>   [HW-NCI_TOPICS]
	> 
	> query.ToString = category:HW -"nci topics" +space
	>
	> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
	> Expected:+category:HW-NCI_TOPICS +space
	> Actual  :category:HW -"nci topics" +space
	> 
	
	Well query parser does not allow `-' within words currently.
	So before your analyzer is called, query parser reads one word HW, a `-'
	operator, one word NCI_TOPICS.
	The latter is analyzed as "nci topics" because it's not in field category
	anymore, I guess.
	
	I suggested to change this. See
	http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
	
	Either you escape the - using category:HW\-NCI_TOPICS in your query
	(untested. and I don't know where the escape character will be removed)
	or you apply my suggested change.
	
	Another option for using keywords with query parser might be adding a
	keyword syntax to the query parser.
	Something like category:key("HW-NCI_TOPICS") or category="HW-NCI_TOPICS".
	
	HTH
	        Morus
	
	---------------------------------------------------------------------
	To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
	For additional commands, e-mail: lucene-user-help@jakarta.apache.org
	
	

Mime
View raw message