incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <>
Subject Re: [lucy-dev] Simplifying the Query Parser
Date Sat, 16 Apr 2011 03:27:28 GMT
David E. Wheeler wrote on 4/15/11 5:41 PM:

> * Deprecate heed_colons. Always heed colons.

+1 to that

> * If you search for "foo:bar" and the field "foo" doesn't exist or is not 
> public, treat it as a term.

This, I'm not so sure about. At least, it should be configurable, with a
'strict' flag or similar, which if enabled, would throw errors if 'foo' was not
a defined field. Otherwise, depending on the user, we're in the realm of silent
failure rather than gracious host.

Do we even have a feature for public/private fields atm?

The reason I join Marvin in wishing to avoid the QueryParser wars is that
parsers are notoriously hard to get 100% correct for the 80% of features most
applications require. And for the other 20% it really becomes application
specific as to how the parser should behave. Do you want it to be strict and
exacting like a SQL parser? Or fuzzier, like a NLP-type parser? My apps lean
toward the former because I use Lucy to front dbs as a fast, denormalized index.
Still, both kinds are necessary, since the fuzzy, forgiving parser is what most
users (mine included!) expect from a single-input-box search engine.

So what kind of parser should our QueryParser be? I don't think it can be all
things to all users, so being most things to most users is actually a pretty
good goal, imo.

I think the focus on reliable, flexible *Query classes has been a good design
choice to date, because it means that it is quite straightforward to roll your
own query parser (as I have done), entirely suited to your application's needs,
sidestepping the Lucy QueryParser altogether. That's good library design, imo.

I agree that Lucy QueryParser should get simpler over time, by losing features,
but not convinced that the behavior you're proposing actually does that. I'm
happy to be convinced though.

> As a result "module:PHP::Interpreter" will properly search "PHP OR 
> Interpreter IN module" and "PHP::Interpreter" will search "PHP OR 
> Interpreter", and "secret_filed:whatever" will search "secret OR field OR 
> "whatever".

I would expect 'module:PHP::Interpreter' to properly search 'PHP AND Interpreter
IN module' instead.

I'm not just quibbling here; I really do expect that. And maybe that illustrates
my point about parsers being hard to get just right; they define that murky
space where machines try to interpret messy human language and transform it into
something more machine-like. A fraught exercise. But so fun! That's why I'm
interested in this problem space. :)

Peter Karman  .  .

View raw message