lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David E. Wheeler" <>
Subject [lucy-dev] Simplifying the Query Parser
Date Fri, 15 Apr 2011 22:41:22 GMT

Marvin and I were just discussing the QueryParser on IRC. Years ago, I reported a bug in the
KinoSearch query parser:

Basically, if I searched on "PHP::Interpreter", the parser died. Marvin fixed this bug, and
I think partly as a result of this, introduced the `heed_colons` attribute that persists today
in Luncy::Search::QueryParser. But as I understand it, `heed_colons` has three issues:

1. It adds complexity to the parser (simpler is better).
2. It has a security vulnerability: If a user searches on "secret_field:foo", it will search
only secret_field, and you might not want that.
3. If a field doesn't exist, the results may be meaningless.

In discussing these issues with Marvin, he expressed a strong desire not to get into QueryParser
wars, and I can understand that. I think that one of the strengths of Lucy is that the default
QueryParser offers a decent 80% solution for most users, while offering the power of toolkit
hackers to do even more. With that in mind, I think we've come up with a solution to the above
issues that actually *simplifies* QP a bit:

* Deprecate heed_colons. Always heed colons.
* If you search for "foo:bar" and the field "foo" doesn't exist or is not public, treat it
as a term.

So addressing the above three points, this change would:

1. Remove complexity (or at least deprecate it)
2. Prevent private fields from being searched
3. Return relevant results when a colon term does not match a public field.

As a result "module:PHP::Interpreter" will properly search "PHP OR Interpreter IN module"
and "PHP::Interpreter" will search "PHP OR Interpreter", and "secret_filed:whatever" will
search "secret OR field OR "whatever".




View raw message