drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Improvements to storage plugin planning integration support
Date Thu, 22 Oct 2015 18:26:05 GMT
I do not know how Phoenix's planning works. For Drill, my
understanding is during logical planning, "collation" trait is only
used in SortRemoveRule, to remove the redundant sort operator. (Those
"sort" operators are the one created by Calcite for user-explicit
"ORDER BY" / "LIMIT", not the "enforcer" created in physical
planning).

The "collation" trait would not have impact in logical planning for
join / aggregation.   The decision between sort-based vs hash-based
join / aggregation is made in physical planning. At that stage, the
"collation" would matter a lot, as it would mean whether Drill has to
add an "enforcer" to get certain trait, in order to get a plan with
sort-based join / aggregation.

The "collation" trait acts like a physical property, it's more nature
to expose "collation" in physical planning in stead of logical
planning, which more focus on properties inherent in relational
expression. Aman's view that secondary index is part of physical
planning makes sense to me.

On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <maryann.xue@gmail.com> wrote:
> Hi Aman Sinha,
>
> Yes, Phoenix uses materialization in Calcite to model its secondary index
> querying. But it's not right to say "In that sense, it would seem to fit
> into physical planning phase rather than logical, since indexes are a
> faster physical access mechanism for a scan.  The logical properties of a
> table don't change due to presence of an index."
>
> A secondary index in Phoenix is a projection of part or all of the columns
> of the original table, and is usually indexed (and sorted) on a different
> key other than the primary key of the original table. The key in Phoenix
> table (HBase table) is crucial in two ways:
> 1. Filtering: the use of skip-scan or range-scan vs. full scan.
> 2. Ordering
>
> The second aspect is represented in Calcite by "collation" trait, which can
> make a radical difference in logical planning. Replacing the original table
> with one of its indices might end up changing the whole plan completely.
>
> I am not sure yet which stage the Phoenix materialization will eventually
> go, but one certain thing is that it should be available for all the
> general optimizations to take effect.
>
>
> Thanks,
> Maryann
>
> On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <asinha@maprtech.com> wrote:
>
>> Catching up on this thread.  Jacques, if I understand correctly,  you are
>> proposing that instead of the single point of initialization of rules when
>> we instantiate FrameworkConfig (in DrillSqlWorker), we would have more
>> entry points to plug into different phases of planning and storage plugins
>> would register different sets of rules in these separate phases.   It seems
>> fine to me (assuming that there are no side effects where we somehow end up
>> increasing the search space for the existing plans).
>>
>> When talking about the Phoenix integration or the JDBC storage plugin, I
>> am curious about which phase(s) would they register the rules for ?  I
>> believe Phoenix's materialized view usage in Calcite is actually for
>> secondary indexing, not for materialized views per se.  In that sense, it
>> would seem to fit into physical planning phase rather than logical, since
>> indexes are a faster physical access mechanism for a scan.  The logical
>> properties of a table don't change due to presence of an index.
>>
>> On the other hand, I think the JDBC plugin might register rules for
>> logical phase since  it would have filter and projection pushdowns that do
>> change logical properties.
>>
>> Aman
>>
>>
>> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:
>>
>>> I would +1 (1-3) for sure. I do not have much understanding of programs
>>> however additional flexibility for storage plugin devs sounds cool in
>>> general when used responsibly =) so +0 for (4)
>>>
>>>
>>> -H+
>>>
>>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <jacques@dremio.com>
>>> wrote:
>>>
>>> > The dead air must mean that everyone is onboard with my recommendation
>>> >
>>> > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>>> >
>>> > interface PlannerIntegration{
>>> >   void initialize(Planner, Phase)
>>> > }
>>> >
>>> > Right :D
>>> >
>>> > --
>>> > Jacques Nadeau
>>> > CTO and Co-Founder, Dremio
>>> >
>>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <jacques@dremio.com>
>>> wrote:
>>> >
>>> > > A number of us were meeting last week to work through integrating the
>>> > > Phoenix storage plugin. This plugin is interesting because it also
>>> uses
>>> > > Calcite for planning. In some ways, this should make integration easy.
>>> > > However, it also allowed us to see certain constraints who how we
>>> expose
>>> > > planner integration between storage plugins and Drill internals.
>>> > > Currently, Drill asks the plugin to provide a set of optimizer rules
>>> > which
>>> > > it incorporates into one of the many stages of planning. This is too
>>> > > constraining in two ways:
>>> > >
>>> > > 1. it doesn't allow a plugin to decide which phase of planning to
>>> > > integrate with. (This was definitely a problem in the Phoenix case.
>>> Our
>>> > > hack solution for now is to incorporate storage plugin rules in phases
>>> > > instead of just one [1].)
>>> > > 2. it doesn't allow arbitrary transformations. Calcite provides a
>>> program
>>> > > concept. It may be that a plugin needs to do some of its own work
>>> using
>>> > the
>>> > > Hep planner. Currently there isn't an elegant way to do this in the
>>> > context
>>> > > of the rule.
>>> > > 3. There is no easy way to incorporate additional planner
>>> initialization
>>> > > options. This was almost a problem in the case of the JDBC plugin.
It
>>> > > turned out that a hidden integration using register() here [2]
>>> allowed us
>>> > > to continue throughout the planning phases. However, we have to
>>> register
>>> > > all the rules for all the phases of planning which is a bit unclean.
>>> > We're
>>> > > hitting the same problem in the case of Phoenix where we need to
>>> register
>>> > > materialized views as part of planner initialization but the hack from
>>> > the
>>> > > JDBC case won't really work.
>>> > >
>>> > > I suggest we update the interface to allow better support for these
>>> types
>>> > > of integrations.
>>> > >
>>> > > These seem to be the main requirements:
>>> > > 1. Expose concrete planning phases to storage plugins
>>> > > 2. Allow a storage plugin to provide additional planner initialization
>>> > > behavior
>>> > > 3. Allow a storage plugin to provide rules to include a particular
>>> > > planning phase (merged with other rules during that phase).
>>> > > 4. (possibly) allow a storage plugin to provide transformation
>>> programs
>>> > > that are to be executed in between the concrete planning phases.
>>> > >
>>> > > Item (4) above is the most questionable to me as I wonder whether or
>>> not
>>> > > this could simply be solved by creating a transformation rule (or
>>> program
>>> > > rule in Calcite's terminology) that creates an alternative tree and
>>> thus
>>> > be
>>> > > solved by (3).
>>> > >
>>> > > A simple solution might be (if we ignore #4):
>>> > >
>>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>>> > >
>>> > > interface PlannerIntegration{
>>> > >   void initialize(Planner, Phase)
>>> > > }
>>> > >
>>> > > This way, a storage plugin could register rules (or materialized
>>> views)
>>> > at
>>> > > setup time.
>>> > >
>>> > > What do others think?
>>> > >
>>> > > [1]
>>> > >
>>> >
>>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145
>>> > > [2]
>>> > >
>>> >
>>> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119
>>> > >
>>> > > --
>>> > > Jacques Nadeau
>>> > > CTO and Co-Founder, Dremio
>>> > >
>>> >
>>>
>>
>>

Mime
View raw message