lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: QueryParser Is Badly Broken
Date Fri, 13 Oct 2006 07:23:38 GMT
On Friday 13 October 2006 01:55, Mark Miller wrote:
> There is also the Surround Query Parser in contrib by the way...I would bet
> that Paul will tell you that it does not have these issues. I can't wait to

Indeed.

> see the replies on this one...I didn't realize that the QueryParser had
> these problems and am a bit skeptical...unfortunately I am away from home
> and cannot check it out.

Surround is not perfect, either. One of the disadvantages of surround
is that it does not map to PhraseQuery.

> On another note...http://famestalker.com/devwik/ will be done soon...I only

The url gives a not found 404 error here.

> have not gotten around to finishing the final touches because there did not
> appear to be a lot of initial interest (and what there was has waned
> drastically) and I am not ready to use it myself yet. It does correctly
> handle order of operations however, and as far as I know is the only parser
> to handle arbitray nesting and mixing of boolean and proximity queries.
> (perhaps surround does as well...I would be really interested to know, but I
> assume that it handles only the base cases ie not "(car & basket) within 2
> of (horse & carriage within 3 of car). Of course who really cares about such
> queries, but hey ;)

Surround maps proximity queries to SpanNearQuery, and that only allows
OR'ing in its operands. Surround does not map to SpanNotQuery,
but it will parse nested proximity queries and generate a nested
SpanNearQuery.
 
> You'll get better advice from others more experienced, but my bet is that
> Paul's surround parser is top notch and correctly does what you want.
> 
> - Mark
> 
> >
> 
> On 10/12/06, Renaud Waldura <renaud+lucene@waldura.com> wrote:
> >
> > I'm developing an application used by scientists -- people who have a
> > pretty
> > good idea of what logic is -- and they were shocked to find out that
> > neither
> > of these queries return the same results:
> >
> > 1- banana AND apple OR orange
> > 2- banana AND (apple OR orange)
> > 3- (banana AND apple) OR orange
> >
> > I'd expect (1) to be either (2) or (3), but it turns out it's parsed as
> > "+banana apple orange". I was rather, uh, dismayed by this find, as it
> > doesn't seem to make sense.
> >
> > I just spent half a day reading up on the various ways QueryParser is
> > broken, by going through the bugs and the mailing-list archives. And I'm
> > still unable to come to a conclusion. Here's where I'm at:
> >
> >     a- queries which mix boolean operators require strict parenthesizing
> > to
> > work right
> >
> >     b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT" and
> > the
> > default operator "" rarely does what you expect
> >
> >     c- the stock QueryParser doesn't work well in these cases
> >
> >     d- there's a new PrecedenceQueryParser at
> > 
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneousthat
> > solves *some* of the issues but creates others
> >
> >     e- there is a non-Lucene effort to create a query parser with a
> > different syntax at http://famestalker.com/devwiki/
> >
> > While we are also developing a query-building UI, users must be able to
> > enter text queries as well. What do other folks do? I mean, this is pretty
> > bad. I can hardly go back to my scientists and tell them Lucene is unable
> > to
> > handle 2 boolean operators, that they should parenthesize everything by
> > hand. I mean, that's just cheesy.

For a query building UI it might be better to output queries in XML form
to a Lucene server, see contrib/xml-query-parser .

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message