incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Simplifying the Query Parser
Date Mon, 02 May 2011 15:43:05 GMT
On Mon, Apr 18, 2011 at 02:16:33PM -0700, David E. Wheeler wrote:
> On Apr 18, 2011, at 12:41 PM, Marvin Humphrey wrote:

> > The others are all solvable by tightening up the parser:
> > 
> >  * Currently field names must match /[0-9A-Za-z_]+/.  We should require
> >    them to be identifiers, i.e. they must not start with a number:
> >    /[A-Za-z_][0-9A-Za-z_]*/
> >  * QueryParser should use single-token lookahead to enforce that field name
> >    constructs must be followed by something sensible.
> 
> And if it's not, what then?

If it's not, as in the forward slash following the colon in
'http://www.apache.org/', then we consume the whole thing as a leaf --
typically resulting in a phrase query.

> What if it's sensible but the field doesn't exist (or is private)?

For now, we just use the field name.  Starting in 0.2.0, I think we should
consider parsing such constructs as NoMatchQueries.  But that's a more
involved change.

> > This issue should not block 0.1.0, which is almost done.  
> 
> Well, if the handling of PHP::Interpreter is a bug, should that not be fixed
> before 0.1.0?

Yes, I agree.

With r1096201 and r1096207, the two changes proposed above have been
implemented.  There is no change in behavior unless set_heed_colons() has been
invoked.  If heed_colons is true, then the following query strings will now
produce sensible results:

    http://www.apache.org/
    10:30
    PHP::Interpreter

This will still produce an unexpected result:

    mailto:me@example.com

Marvin Humphrey


Mime
View raw message