Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Date: Mon, 5 Dec 2016 16:15:58 +0000 (UTC)
From: "Joel Bernstein (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Message-ID: <JIRA.12933820.1453730983000.439867.1480954558537@Atlassian.JIRA>
In-Reply-To: <JIRA.12933820.1453730983000@Atlassian.JIRA>
References: <JIRA.12933820.1453730983000@Atlassian.JIRA> <JIRA.12933820.1453730983078@arcas>
Subject: [jira] [Comment Edited] (SOLR-8593) Integrate Apache Calcite into
 the SQLHandler
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 05 Dec 2016 16:16:05 -0000


    [ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722648#comment-15722648 ] 

Joel Bernstein edited comment on SOLR-8593 at 12/5/16 4:15 PM:
---------------------------------------------------------------

I wanted to give an update on my work on this ticket.

I've started working my way through the test cases (TestSQLHandler). I'm working through each assertion in each method to understand the differences between the current release and the work done in this patch, and making changes/fixes as I go.

The first change that I made was in how the predicate is being traversed. The current patch doesn't descend through a full nested AND/OR predicate. So I made a few changes to how the tree is walked. I also changed some how the query is re-written to a Lucene/Solr query so that it matches the current implementation.

I've now moved on to aggregate queries. I've been investigating the use of EXPR$1 ... instead of using the *function signature* in the result set. It looks like we'll have to use the Caclite expression identifiers going forward, which should be OK. I think this is cleaner anyway because looking up fields by a function signature can get cumbersome. We'll just need to document this in the CHANGES.txt.

The next step for me is implement the aggregationMode=facet logic for aggregate queries. After that I'll push out my changes to this branch. 

Then I'll spend some time investigation how SELECT distinct behaves in our implementation. As [~julianhyde] mentioned. we should see DISTINCT queries as aggregate queries so it's possible we'll have all the code in place to push this to Solr already.


was (Author: joel.bernstein):
I wanted to give an update on my work on this ticket.

I've started working my way through the test cases (TestSQLHandler). I'm working through each assertion in each method to understand the differences between the current release the work done in this patch, and making changes/fixes as I go.

The first change that I made was in how the predicate is being traversed. The current pant doesn't descend through a full nested AND/OR predicate. So I made a few changes to how the tree is walked. I also changed some how the query is re-written to a Lucene/Solr query so that it matches the current implementation.

I've now moved on to aggregate queries. I've been investigating the use of EXPR$1 ... instead of using the *function signature* in the result set. It looks like we'll have to use the Caclite expression identifiers going forward, which should be OK. I think this is cleaner anyway because looking up fields by a function signature can get cumbersome. We'll just need to document this in the CHANGES.txt.

The next step for me is implement the aggregationMode=facet logic for aggregate queries. After that I'll push out my changes to this branch. 

Then I'll spend some time investigation how SELECT distinct behaves in our implementation. As [~julianhyde] mentioned. we should see DISTINCT queries as aggregate queries so it's possible we'll have all the code in place to push this to Solr already.


> Integrate Apache Calcite into the SQLHandler
> --------------------------------------------
>
>                 Key: SOLR-8593
>                 URL: https://issues.apache.org/jira/browse/SOLR-8593
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>         Attachments: SOLR-8593.patch, SOLR-8593.patch
>
>
>    The Presto SQL Parser was perfect for phase one of the SQLHandler. It was nicely split off from the larger Presto project and it did everything that was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where Apache Calcite comes into play. It has a battle tested cost based optimizer and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans will continue to be translated to Streaming API objects (TupleStreams), so continued work on the JDBC driver should plug in nicely with the Calcite work.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org