lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Alves (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
Date Wed, 22 Jul 2009 21:25:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323
] 

Luis Alves edited comment on LUCENE-1486 at 7/22/09 2:24 PM:
-------------------------------------------------------------

Mark H - 

Question 1)

I added a doc 5 and 6
{code:title=TestComplexPhraseQuery.java|borderStyle=solid}
...
  DocData docsContent[] = { new DocData("john smith", "1"),
      new DocData("johathon smith", "2"),      
      new DocData("john percival smith goes on  a b c vacation", "3"),
      new DocData("jackson waits tom", "4"),
      new DocData("johathon smith john", "5"),
      new DocData("johathon mary gomes smith", "6"),
      };
...
{code}

for test 
    checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with

would document 5 be returned or just doc 2 should be returned,
I'm assuming position is always important and doc 5 is supposed to be returned.
Is this the correct behavior?

Question 2)
Should these 2 queries behave the same when we fix the problem
    // checkMatches("\"john -percival\"", "1"); // not logic doesn't work
    // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work

Question 3)
for query:
checkMatches("\"jo*  smith\"~2", "1,2,3,5"); // position logic works.
doc 6 is also returned, so this feature does not seem to be working.

Question 4)
The usage of AND and AND_NEXT_TO is confusing to me
the query 
checkMatches("\"(jo* AND mary)  smith\"", "1,2,5"); // boolean logic with

returns 1,2,5 and not 6, but I was only expecting 6 to be returned,
seems that like the AND is converted into a OR.
What is the behavior you want to implement?




      was (Author: lafa):
    Mark H - 

Question 1)

I added a doc 5 and 6
{code:title=TestComplexPhraseQuery.java|borderStyle=solid}
...
  DocData docsContent[] = { new DocData("john smith", "1"),
      new DocData("johathon smith", "2"),      
      new DocData("john percival smith goes on  a b c vacation", "3"),
      new DocData("jackson waits tom", "4"),
      new DocData("johathon smith john", "5"),
      new DocData("johathon mary gomes smith", "6"),
      };
...
{code}

for test 
    checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with

would document 5 be returned or just doc 2 should be returned,
I'm assuming position is always important and doc 5 is supposed to be returned, correct?

Question 2)
Should these 2 queries behave the same when we fix the problem
    // checkMatches("\"john -percival\"", "1"); // not logic doesn't work
    // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work

Question 3)
checkMatches("\"jo*  smith\"~2", "1,2,3,5"); // position logic works.
doc 6 is also returned, so this feature does not seem to be working.

Question 4)
The usage of AND and AND_NEXT_TO is confusing to me
the query 
checkMatches("\"(jo* AND mary)  smith\"", "1,2,5"); // boolean logic with

returns 1,2,5 and not 6, but I was only expecting 6 to be returned,
Can you describe what is the behavior here.
Looks like the and is converted into a OR.
What is the behavior you want to implement?



  
> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch,
junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch,
LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to
allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser
itself. This works as a proof of concept  for much of the query parser syntax. Examples from
the Junit test include:
> 		checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> 		checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> 		checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
> 		
> 		checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> 		checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> 		checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message