lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adriano Crestani (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
Date Wed, 22 Jul 2009 00:27:14 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adriano Crestani updated LUCENE-1486:
-------------------------------------

    Attachment: junit_complex_phrase_qp_07_21_2009.patch

Thanks for the quick response Mark!

OK, I'm trying now to figure out what is supported reading the junits only, and I ran into
some issues:

What do you mean on the last check by phrase inside phrase, I don't see any phrase inside
a phrase (I'm not sure either what it would be, because there is no open and close phrase
delimiter), all I see is a phrase <"jo*">, followed by a term <smith> and an empty
phrase <" ">. And the check passes because the query parser throws an exception complaning
about the empty phrase, it seems to not be supported. I just changed the empty phrase to a
valid phrase and the query works (failing the test case). But as I said, I'm not sure what
you were exactly trying to do there, could you give me more explation about that?

I'm also getting a java.util.ConcurrentModificationException when I type an escaped double
quotes inside phrases. So, I suppose it's not supported, but shouldn't it throw a better exception?

I also have an issue with the parse exceptions, if it comes from inside a phrase, it does
not tell the correct position in the query string. I think it considers the beginning of the
phrase as the beginning of the query and it only prints the phrase that contains the problem.

I'm attaching some changes I did in the TestComplexPhraseQuery junit that shows these problems
I'm getting, I think it's easier to understand if you read and run it.

Sorry for so many questions, but I'm just trying to understand what exactly this query parser
supports or not.

Thanks,
Adriano Crestani Campos

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch,
LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to
allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser
itself. This works as a proof of concept  for much of the query parser syntax. Examples from
the Junit test include:
> 		checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> 		checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> 		checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
> 		
> 		checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> 		checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> 		checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message