lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler
Date Fri, 16 Dec 2016 04:09:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753373#comment-15753373
] 

Joel Bernstein commented on SOLR-8593:
--------------------------------------

If you we have two grouping fields *A, B* nested facets will be gathered using the following
approach:

1) Gather the *top N* facets for field A.
2) For each of the  *top N* facets of field A, find the top N sub-facets for field B

This avoids the exhaustive processing of all the unique combinations of A, B.

This is very performant (sub-second) when N is a relatively small number and the cardinality
of A, B is not too high.

In high cardinality scenarios we can switch to MapReduce mode which sorts the Tuples on the
GROUP BY fields and shuffles them to worker nodes. In MapReduce mode the order of the GROUP
BY fields is not important.

Having the ability to use faceting or MapReduce depending on cardinality is one of the key
features of Solr's SQL implementation.

> Integrate Apache Calcite into the SQLHandler
> --------------------------------------------
>
>                 Key: SOLR-8593
>                 URL: https://issues.apache.org/jira/browse/SOLR-8593
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>         Attachments: SOLR-8593.patch, SOLR-8593.patch
>
>
>    The Presto SQL Parser was perfect for phase one of the SQLHandler. It was nicely split
off from the larger Presto project and it did everything that was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where Apache Calcite
comes into play. It has a battle tested cost based optimizer and has been integrated into
Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans will continue
to be translated to Streaming API objects (TupleStreams), so continued work on the JDBC driver
should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message