lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adriano Crestani (JIRA)" <>
Subject [jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
Date Thu, 23 Jul 2009 00:11:14 GMT


Adriano Crestani commented on LUCENE-1486:

I propose doing this using using the new QP implementation. (I can write the new javacc QP
for this)
(this implies that the code will be in contrib in 2.9 and be part of core on 3.0)

That would be good!

Granted, the test fails for a reason other than the one for which I wanted it to fail.
We can probably strike the test and leave a note saying phrase-within-a-phrase just does not
make sense and is not supported.

Cool, I agree to remove it. But I still don't see how an user can type a phrase inside a phrase
with the current syntax definition, can you give me an example?

In brackets it's an OR - the brackets are used to suggest that the current phrase element
at position X is composed of some choices that are evaluated as a subclause in the same way
that in normal query logic sub-clauses are defined in brackets e.g. +a +(b OR c). There seems
to be a reasonable logic to this.

Ideally the ComplexPhraseQueryParser should explicitly turn this setting on while evaluating
the bracketed innards of phrases just in case the base class has AND as the default.

If we use the implemented java cc code Luis suggested, we would have already a query parser
that throws ParseExceptions whenever the user types an AND inside a phrase.

OR,||,+, AND, && ..... ignored

So we should throw an excpetion if any of these is found inside a phrase. It could confuse
the user if we just ignore it.

    Question 2)
    Should these 2 queries behave the same when we fix the problem
    // checkMatches("\"john -percival\"", "1"); // not logic doesn't work
    // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work

I suppose there's an open question as to if the second example is legal (the brackets are

Yes, the second is unnecessary, but I don't think it's illegal. The user could type <(smith)>
outside the phrase, it makes sense to support it inside also.

    Question 3)
    checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works.
    doc 6 is also returned, so this feature does not seem to be working.

That looks like a bug related to slop factor?

I have not checked yet, but I think it's working fine. The slop means how many switches between
the terms inside the phrase is allowed to match the query. It matches doc 6, because the term
<smith> switches twice to the right and matched "johathon mary gomes smith". Twice =
slop 2 :)

ANDs are ignored and turned into ORs (see earlier comments) but maybe a query parse error
should be thrown to emphasise this.

I think we could support AND also. I agree there are few cases where the user would use that.
It would work as I explained before:

What happens if I type "(query AND parser) lucene". In my point of view it is: "(query AND
parser) AND_NEXT_TO lucene". Which means for me: find any document that contains the term
'query' and the term 'parser' in the position x, and the term 'lucene' in the position x+1.
Is this the expected behaviour?

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>                 Key: LUCENE-1486
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>         Attachments:, junit_complex_phrase_qp_07_21_2009.patch,
junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch,
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to
allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser
itself. This works as a proof of concept  for much of the query parser syntax. Examples from
the Junit test include:
> 		checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> 		checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> 		checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
> 		checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> 		checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> 		checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message