incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Future of Blur Query Language
Date Thu, 30 Aug 2012 00:05:01 GMT
I think a limited SQL implementation in the blur-jdbc lib is a good idea but not as the main
query language.  I agree with Tim that although extended the lucene syntax will be difficult
I think it's the best approach going forward.  I think most of what's needed for blur 0.1
types can be supported with the standard query parser. The issue I've had is dealing with
how to control joins. 

I did some prototyping of the current query parser and it looks like we could use the parser
itself for joins. Let me explain. 

With superOn=true

+cf1.f1:1234 +cf1.f1:5678

Would yield a Boolean query with 2 clauses of term queries. Like:

Bq(+cf1.f1:1234 +cf1.f1:5678)

However if you group the Boolean query

+(+cf1.f1:1234) +(+cf1.f1:5678)

It parses into:

Bq(+bq(+cf1.f1:1234) +bq(+cf1.f1:5678))

Basically it maintains the grouping, at some point in the past it did not (I think). With
this grouping we could implement the super query(join). So the query could turn into this:

Bq(+superquery(+cf1.f1:1234) +superquery(+cf1.f1:5678)) 

This would allow us to not have to modify the lucene syntax for "joins".  I think this is
the simplest approach for 0.1.x. In 0.2 we can discuss more extensions to the lucene syntax.


Thoughts?

Aaron


Sent from my iPad

On Aug 28, 2012, at 10:45 PM, Tim Williams <williamstw@gmail.com> wrote:

> On Sun, Aug 26, 2012 at 10:01 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <tim.tutt@gmail.com> wrote:
>>> Aaron,
>>> 
>>> Just for a little clarification on your example, when you say JOIN, are you
>>> actually just talking about a union of two sets or are you actually
>>> referring to the relational type of join where the intent is to merge them
>>> into a single record? If it's the former, wouldn't a simple OR suffice?
>> 
>> Well it's a little different in the Lucene world, but in essence it
>> would be the latter.  However the result is not a single Record but
>> rather a Row that contains the 2 Records.
>> 
>> Take a look at this link:
>> http://lucene.apache.org/core/3_6_1/api/contrib-join/org/apache/lucene/search/join/package-summary.html
>> 
>> Blur uses the Index-time joins, but it's an internal piece of code.
>> Blur doesn't actually use this contrib although maybe it should.
>> 
>>> 
>>> Provided that I am in fact missing something, here are my thoughts on the
>>> query language:
>>> 
>>> A common theme that I have seen across the board with commercial
>>> search/discovery products is the creation of a query language modeled after
>>> SQL with varying limitations. This tends to be fairly effective as the
>>> learning curve is not too steep for users who have experience writing SQL
>>> queries and dealing with relational databases. Additionally, these users
>>> normally find a way to live with the limitations of the language and find
>>> ways around the problems they are trying to solve as the language is
>>> typically advanced enough to be creative.
>>> 
>>> Such a language, however, does not lend it self well to the less advanced
>>> end users of your product. Perhaps in certain cases this is acceptable as
>>> you will always have some advanced user available, but in the cases where
>>> these advanced users are in limited supply the learning curve becomes
>>> steeper as the technical ability and know-how decreases.
>> 
>> I agree with your assessment of a SQL-like language, my fear in making
>> this the standard for all queries in Blur is the extra syntax the
>> language would require.  For example:
>> 
>> "select * from test_table where super = 'test';"
>> 
>> But this really isn't correct because in sql this would mean an exact
>> match and you would have to index the data in several different ways
>> to make super = 'test' work.  Instead it should be something like:
>> 
>> "select * from test_table where super like 'test';"
>> 
>> However in Lucene syntax and CQL it's just:
>> 
>> "test"
>> 
>> Also I like the separation of what to result from the query, as well
>> as where to start, how many to fetch, etc.
>> 
>> Blur has a JDBC project, perhaps both can be used.  We could use SQL
>> as a control language for passing what to select, sort by, etc and let
>> CQL be the query language.
> 
> While once a fan, I'd hope CQL isn't the answer.  We'd lose
> field/index projections over boolean clauses and be limited to prox
> being a boolean operator - those aren't fixable without straying from
> the spec.  The CQL spec peeps also seem disconnected from any
> implementation such that none of the later strictly resemble the
> former - and there appears little opportunity for implementations in
> the wild to actually inform the specification.
> 
> So I like your Option1:)  If we just extend lucene's syntax it gets
> over your biggest concern - though it does leave a *lot* of work to be
> done:(
> 
> blurQuery ::= luceneQuery (havingClause)? (sortClause)?
> 
> havingClause ::= 'HAVING' luceneQuery //not sure if this is a subset or not?
> 
> sortClause ::= 'sortby' field
> 
> Thanks,
> --tim

Mime
View raw message