lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller" <markrmil...@gmail.com>
Subject Re: QueryParser Is Badly Broken
Date Thu, 12 Oct 2006 23:55:07 GMT
There is also the Surround Query Parser in contrib by the way...I would bet
that Paul will tell you that it does not have these issues. I can't wait to
see the replies on this one...I didn't realize that the QueryParser had
these problems and am a bit skeptical...unfortunately I am away from home
and cannot check it out.

On another note...http://famestalker.com/devwik/ will be done soon...I only
have not gotten around to finishing the final touches because there did not
appear to be a lot of initial interest (and what there was has waned
drastically) and I am not ready to use it myself yet. It does correctly
handle order of operations however, and as far as I know is the only parser
to handle arbitray nesting and mixing of boolean and proximity queries.
(perhaps surround does as well...I would be really interested to know, but I
assume that it handles only the base cases ie not "(car & basket) within 2
of (horse & carriage within 3 of car). Of course who really cares about such
queries, but hey ;)

You'll get better advice from others more experienced, but my bet is that
Paul's surround parser is top notch and correctly does what you want.

- Mark

>

On 10/12/06, Renaud Waldura <renaud+lucene@waldura.com> wrote:
>
> I'm developing an application used by scientists -- people who have a
> pretty
> good idea of what logic is -- and they were shocked to find out that
> neither
> of these queries return the same results:
>
> 1- banana AND apple OR orange
> 2- banana AND (apple OR orange)
> 3- (banana AND apple) OR orange
>
> I'd expect (1) to be either (2) or (3), but it turns out it's parsed as
> "+banana apple orange". I was rather, uh, dismayed by this find, as it
> doesn't seem to make sense.
>
> I just spent half a day reading up on the various ways QueryParser is
> broken, by going through the bugs and the mailing-list archives. And I'm
> still unable to come to a conclusion. Here's where I'm at:
>
>     a- queries which mix boolean operators require strict parenthesizing
> to
> work right
>
>     b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT" and
> the
> default operator "" rarely does what you expect
>
>     c- the stock QueryParser doesn't work well in these cases
>
>     d- there's a new PrecedenceQueryParser at
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneousthat
> solves *some* of the issues but creates others
>
>     e- there is a non-Lucene effort to create a query parser with a
> different syntax at http://famestalker.com/devwiki/
>
> While we are also developing a query-building UI, users must be able to
> enter text queries as well. What do other folks do? I mean, this is pretty
> bad. I can hardly go back to my scientists and tell them Lucene is unable
> to
> handle 2 boolean operators, that they should parenthesize everything by
> hand. I mean, that's just cheesy.
>
> --Renaud
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message