lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terje Eggestad (JIRA)" <>
Subject [jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
Date Thu, 19 Aug 2010 13:22:20 GMT


Terje Eggestad commented on LUCENE-1486:


I'm about begin using the ComplexPhraseQueryParser with 3.0.2 as we need wildcard with phrases
and proximity 

Our customers have a habit of including '-' in phrases which seem to trigger a bug :

If you add the following tests to the TestComplexPhraseQueryParser class:

		checkMatches("\"joe john nosuchword\"", "");  
		checkMatches("\"joe-john-nosuchword\"", "");  
		checkMatches("\"john-nosuchword smith\"", "");  

AND add a rewrite() in checkMatches() just after parse :
 			Query q = qp.parse(qString);
 			IndexReader reader = searcher.getIndexReader();  // need for rewrite
  			q = q.rewrite(reader); 

The first two is OK, and is rewritten to:

spanNear([name:joe, name:john, name:nosuchword], 0, true)
name:"joe john nosuchword"

The third bomb out on 

java.lang.IllegalArgumentException: Unknown query type ""
found in phrase query string "john-nosuchword smith"
	at org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(
	at org.apache.lucene.queryParser.TestComplexPhraseQuery.checkMatches(

I made a fix that *seem* to fixit, but I feel on very shaky ground here.
I've made so many debugging hack around that I can't make a propper patch, but I added this
fix to ComplexPhraseQueryParser::rewrite()
just before the place the exception is thrown:

       } else {
        	if (qc instanceof TermQuery) {
        		TermQuery tq = (TermQuery) qc;
        		allSpanClauses[i] = new SpanTermQuery(tq.getTerm());

// START  FIX "A-B C" phrases
        	} else if (qc instanceof PhraseQuery) {
        		PhraseQuery pq = (PhraseQuery) qc;
        		Term[] subterms = pq.getTerms();

        		SpanQuery[] clauses = new SpanQuery[subterms.length];
        		for (int j = 0; j < subterms.length; j++) {
        			clauses[j] = new SpanTermQuery(subterms[j]);
        		allSpanClauses[i] = new SpanNearQuery(clauses, 0, true);
        	}	else {

        		throw new IllegalArgumentException("Unknown query type \""
        				+ qc.getClass().getName()
        				+ "\" found in phrase query string \""
        				+ phrasedQueryStringContents + "\"");

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>                 Key: LUCENE-1486
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 4.0
>         Attachments:, junit_complex_phrase_qp_07_21_2009.patch,
junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch,
LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch,
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to
allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser
itself. This works as a proof of concept  for much of the query parser syntax. Examples from
the Junit test include:
> 		checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> 		checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> 		checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
> 		checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> 		checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> 		checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message