incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Tutt <tim.t...@gmail.com>
Subject Re: Future of Blur Query Language
Date Sat, 25 Aug 2012 20:48:56 GMT
Aaron,

Just for a little clarification on your example, when you say JOIN, are you
actually just talking about a union of two sets or are you actually
referring to the relational type of join where the intent is to merge them
into a single record? If it's the former, wouldn't a simple OR suffice?

Provided that I am in fact missing something, here are my thoughts on the
query language:

A common theme that I have seen across the board with commercial
search/discovery products is the creation of a query language modeled after
SQL with varying limitations. This tends to be fairly effective as the
learning curve is not too steep for users who have experience writing SQL
queries and dealing with relational databases. Additionally, these users
normally find a way to live with the limitations of the language and find
ways around the problems they are trying to solve as the language is
typically advanced enough to be creative.

Such a language, however, does not lend it self well to the less advanced
end users of your product. Perhaps in certain cases this is acceptable as
you will always have some advanced user available, but in the cases where
these advanced users are in limited supply the learning curve becomes
steeper as the technical ability and know-how decreases.

In taking a brief look at the spec for CQL, I tend to agree with your
assessment that it is the best option as it looks like it has the ability
to be flexible enough to fit both cases. It is possible that you will run
into limitations with the queries that your more advanced users are
interested in, but perhaps those are the cases where Blur is not a fit.


Tim

On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> I would to start a thread on the topic of the future of Blur's query
> language.  Currently the "simpleQuery" is just a normal Lucene based
> syntax with a little magic to figure out the joins (via the
> SuperQuery) that the user probably intended.  Of course this guess
> work gets it wrong sometimes.  Let me explain with an example:
>
> Given the query with superOn:
>
> +cf1.field1:value1 +cf1.field2.value2
>
> The current implementation will ASSUME that you want to find where
> "cf1.field1" contains "value1" and where "cf1.field2" contains
> "value2" in the same Record because the column family is the same.
> i.e. NO JOIN across records
>
> But perhaps the user really does want a join, meaning that the user
> wants to find any Row that contains one or more Records that have a
> field "cf1.field1" that contains "value1" and one or more Records in
> the same Row (but not necessarily in the same Record) that contains a
> field "cf1.field2" that contains "value2".  i.e. JOIN
>
> Given that current implementation, the only way to force the JOIN is
> to do something like:
>
> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2
> nocf.nofield:somevalue)
>
> This will trick the parser into creating 2 separate join query
> (SuperQuery) objects and perform the JOIN.
>
>
> THIS IS UGLY.
>
> Here are the current criteria for a query language:
> - The ability to support any Lucene query type (Boolean, Term, Fuzzy,
> Span, etc.)
> - User defined query type should be supported, extensible
> - The query language should be compatible with any programming
> language so that the current thrift RPC can continue to be utilized
>
> Here are options that I have been thinking about:
>
> Option 1:
> Somehow extend the current Lucene Query syntax to support these "new"
> features.  The biggest issue I have with this is that we would be
> creating yet another query language that users would have to learn.
> Also I think that allowing users to extend the query language by
> adding there own types would required a rewrite of the Lucene
> implemented query parser.  So even starting with the Lucene query
> language would be a lot of work.
>
> Option 2:
> Some limited version of SQL or SQL like syntax, basically supporting
> normal SQL with limited join support (probably only natural joins).
> This would be nice, because most users understand SQL.  But because
> Blur can not support all the various operations that SQL can provide
> this will probably be frustrating to users.  And they will need to
> learn what Blur SQL will provide and any special Blur only syntax.  So
> this would again be like inventing another query language.
>
> Option 3:
> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be
> confused with Cassandra Query Language.  Currently I like this option
> the best, because it has built-in extensibility as well as the normal
> options needed for a search engine.  Boolean, fuzzy, wildcard, etc.
>
> I really would like to get other's opinions here and any other options.
>  Thanks!
>
> Aaron
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message