lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Goetz <br...@quiotix.com>
Subject Re: Bug? QueryParser may not correctly interpret RangeQuery text
Date Mon, 03 Jun 2002 01:23:20 GMT

>It's true that the unsofisticated end-user would not
>use SQL, but between range (inclusive, exclusive),
>boolean, fuzzy, etc., the simple query parser you have
>is evolving into something more complex than SQL.

Which is a reasonable argument that range queries are outside the scope of 
what the query parser is supposed to do.

>While SQL supports them with key words, we are getting
>into an endless quest for unused characters to mark
>the latest variation of the query.

I wasn't too happy about having added ranges in the first place for exactly 
this reason.  The query parser is supposed to be a convenience, a 90/10 
(actually, more like a 99/1) solution (one which handles 90% of the queries 
with 10% of the work.)  Pushing for that last 10% at the expense of the 
first 90% is a bad tradeoff IMO.  The raw query classes still work fine for 
that last 10% (or 1%).

>By the way, it
>seems that you already have support for the "WHERE
>..." part (AND, OR, NOT, NEAR). If we had "LIKE" and
>"BETWEEN ... AND ..." we would have almost everything
>SQL has for the matching part.

Two responses to this:

1.  Wrong.  We don't have NEAR at all, and AND, OR, and NOT are simple 
operators which give hints to the BooleanQuery class, they don't impart 
structure to the query.  They are, in fact, a convenience for expressing
   (+a +b)
as
   a AND b
mostly because mainstream search engines support AND and OR.

2.  The argument that "we already have half of it, lets go all the way" is 
a siren song.  In lots of cases, this is basically equivalent to "two 
wrongs make a right."  As in "we already violated the XYZ principle for 
some purpose, so there's no point in letting principle get in the way of 
further 'progress'".

In this particular case, its not quite as bad as that, but its taking us in 
a dangerous direction.  The query parser is not a structured language for 
free text queries -- if it was, it should be designed from the ground up to 
be so.  In cramming in too many features, it would be easy for it to lose 
its most valuable feature -- simplicity.  We may have already done that, 
but there's no point in pushing further just in case there's any doubt.

>I think that the only way to have a query that does
>NOT look like a programming language is to have
>natural language understanding (which we won't have
>for a while.) Once the end user is forced to learn the
>difference between terms and operators, he already is
>in the realm of programming languages.

This is a strong argument for backing out some of the features already 
added so far, but I'm sure that's not what you're suggesting (although 
maybe you should be.)

But I think this argument is basically hogwash.  Don't forget we're arguing 
about features which will be used by less than 1% of the user base, and 
probably less than 100th or maybe 1000th of 1% of all queries entered 
through the query parser.

Right now, we have several ways of building queries:
  - a simple query parser, which can handle the basics (terms, phrases, 
field search, slop, wildcards);
  - a flexible and powerful set of query classes with which developers can 
build arbitrary queries;
  - we can combine the above, letting the user enter query terms and 
produce a Query, and then combine that with other query terms based on 
input in a user interface (such as a date picker.)

Now, if you want to design a new query language, one which is actually 
designed for its intended purpose (rather than having features accreted 
every time someone feels that XYZ query structure is critical enough to go 
in the query parser), be my guest -- I'll help, I'll even write the parser 
for you.  We can call it the AdvancedQueryParser or whatever you want to 
call it, and I won't throw stones at your design.

But I'm going to vigourously -1 any proposal for the query parser that 
makes the Joe Users out there pay for features that are only of interest to 
Joe Gooroo.

Nobody has convinced me at all that the existing query parser is inadequate 
for its intended purpose.


--
Brian Goetz
Quiotix Corporation
brian@quiotix.com           Tel: 650-843-1300            Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message