lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject RE: Query syntax on Keyword field question
Date Wed, 24 Mar 2004 13:55:25 GMT
If you can't wait for a release, you'll have to check out Lucene
directly from CVS, or get one of the nightly builds.

Otis

--- Chad Small <Chad.Small@definityhealth.com> wrote:
> Great info Morus,
>  
> After making the "escape the dash" change to the QueryParser:
>  
> Query query = QueryParser.parse("+category:HW\\-NCI_TOPICS AND
> SPACE",
>                                       "description",
>                                       analyzer);
>       Hits hits = searcher.search(query);
>       System.out.println("query.ToString = " +
> query.toString("description"));
>       assertEquals("HW-NCI_TOPICS kept as-is",
>                    "+category:HW\\-NCI_TOPICS +space",
> query.toString("description"));  <------note that this passes with
> the escape put in, so not "as-is".
>       assertEquals("doc found!", 1, hits.length());
>  
> I'm still getting this output:
>  
>  domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS] 
>  
> query.ToString = +category:HW\-NCI_TOPICS +space
>  
> junit.framework.AssertionFailedError: doc found! expected:<1> but
> was:<0>
>  
> It look like bug,
> http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
> <http://issues.apache.org/bugzilla/show_bug.cgi?id=27491> , was fixed
> today:
>  
> ------- Additional Comments From Otis Gospodnetic
> <mailto:otis@apache.org>  2004-03-24 10:10 -------
> 
> Although tft-monitor should not really result in a phrase query "tft
> monitor", I
> agree that this is better than converting it to tft AND NOT monitor
> (tft -monitor).
> Moreover, I have seen query syntax where '-' characters are used for
> phrase
> queries instead or in addition to quotes, so one could use either
> morus-walter
> or "morus walter".
> 
> I applied your change, as it doesn't look like it breaks anything,
> and I hope
> nobody relied on ill behaviour where tft-monitor would result in AND
> NOT query.
> -----------
> But I assume this fix won't come out for some time.  Is there a way I
> can get this fix sooner?  
> I'm up against a deadline and would very much like this
> functionality. 
>  
> And to go one more step with the KeywordAnalyzer that I wrote,
> changing this method to skip the escape:
>     protected boolean isTokenChar(char c)
>     {
>          if (c == '\\')
>          {
>             return false;
>          }
>          else
>          {
>             return true;
>          }
>       }
> The test then returns with a space:
>  healthecare.domain.lucenesearch.KeywordAnalyzer:
>   [HW-NCI_TOPICS] 
> query.ToString = +category:"HW -NCI_TOPICS" +space
> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is 
> Expected:+category:HW\-NCI_TOPICS +space
> Actual  :+category:"HW -NCI_TOPICS" +space   <----note space where
> escape was.
> thanks,
> chad.
> 
> 	-----Original Message----- 
> 	From: Morus Walter [mailto:morus.walter@tanto.de] 
> 	Sent: Wed 3/24/2004 1:43 AM 
> 	To: Lucene Users List 
> 	Cc: 
> 	Subject: RE: Query syntax on Keyword field question
> 	
> 	
> 
> 	Chad Small writes:
> 	> Here is my attempt at a KeywordAnalyzer - although is not working?
>  Excuse the length of the message, but wanted to give actual code.
> 	> 
> 	> With this output:
> 	> 
> 	> Analzying "HW-NCI_TOPICS"
> 	>  org.apache.lucene.analysis.WhitespaceAnalyzer:
> 	>   [HW-NCI_TOPICS]
> 	>  org.apache.lucene.analysis.SimpleAnalyzer:
> 	>   [hw] [nci] [topics]
> 	>  org.apache.lucene.analysis.StopAnalyzer:
> 	>   [hw] [nci] [topics]
> 	>  org.apache.lucene.analysis.standard.StandardAnalyzer:
> 	>   [hw] [nci] [topics]
> 	>  healthecare.domain.lucenesearch.KeywordAnalyzer:
> 	>   [HW-NCI_TOPICS]
> 	> 
> 	> query.ToString = category:HW -"nci topics" +space
> 	>
> 	> junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is
> 	> Expected:+category:HW-NCI_TOPICS +space
> 	> Actual  :category:HW -"nci topics" +space
> 	> 
> 	
> 	Well query parser does not allow `-' within words currently.
> 	So before your analyzer is called, query parser reads one word HW, a
> `-'
> 	operator, one word NCI_TOPICS.
> 	The latter is analyzed as "nci topics" because it's not in field
> category
> 	anymore, I guess.
> 	
> 	I suggested to change this. See
> 	http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
> 	
> 	Either you escape the - using category:HW\-NCI_TOPICS in your query
> 	(untested. and I don't know where the escape character will be
> removed)
> 	or you apply my suggested change.
> 	
> 	Another option for using keywords with query parser might be adding
> a
> 	keyword syntax to the query parser.
> 	Something like category:key("HW-NCI_TOPICS") or
> category="HW-NCI_TOPICS".
> 	
> 	HTH
> 	        Morus
> 	
> 
> ---------------------------------------------------------------------
> 	To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> 	For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 	
> 	
> 
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message