drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <asi...@maprtech.com>
Subject Re: Improvements to storage plugin planning integration support
Date Thu, 22 Oct 2015 18:48:19 GMT
Thanks Maryann and Jinfeng for your comments.   I understand the Phoenix
approach better now that Maryann clarified that the index is actually a
projection of some or all columns (non primary key columns) of the table.
In the relational world, this is similar to what systems such as Vertica
have done.

Aman

On Thu, Oct 22, 2015 at 11:32 AM, Maryann Xue <maryann.xue@gmail.com> wrote:

> Thank you JinFeng for the education on Drill planning! That probably
> justifies putting secondary index into physical planning.
> What I was trying to say was that secondary index is not "a faster
> physical access mechanism", it is just a Phoenix table. And it makes big
> difference in planning related to Sort, Join and Aggregate as you said. In
> the pure Calcite world, this is more of a Logical thing.
>
>
> Thanks,
> Maryann
>
> On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>
>> I do not know how Phoenix's planning works. For Drill, my
>> understanding is during logical planning, "collation" trait is only
>> used in SortRemoveRule, to remove the redundant sort operator. (Those
>> "sort" operators are the one created by Calcite for user-explicit
>> "ORDER BY" / "LIMIT", not the "enforcer" created in physical
>> planning).
>>
>> The "collation" trait would not have impact in logical planning for
>> join / aggregation.   The decision between sort-based vs hash-based
>> join / aggregation is made in physical planning. At that stage, the
>> "collation" would matter a lot, as it would mean whether Drill has to
>> add an "enforcer" to get certain trait, in order to get a plan with
>> sort-based join / aggregation.
>>
>> The "collation" trait acts like a physical property, it's more nature
>> to expose "collation" in physical planning in stead of logical
>> planning, which more focus on properties inherent in relational
>> expression. Aman's view that secondary index is part of physical
>> planning makes sense to me.
>>
>> On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <maryann.xue@gmail.com>
>> wrote:
>> > Hi Aman Sinha,
>> >
>> > Yes, Phoenix uses materialization in Calcite to model its secondary
>> index
>> > querying. But it's not right to say "In that sense, it would seem to fit
>> > into physical planning phase rather than logical, since indexes are a
>> > faster physical access mechanism for a scan.  The logical properties of
>> a
>> > table don't change due to presence of an index."
>> >
>> > A secondary index in Phoenix is a projection of part or all of the
>> columns
>> > of the original table, and is usually indexed (and sorted) on a
>> different
>> > key other than the primary key of the original table. The key in Phoenix
>> > table (HBase table) is crucial in two ways:
>> > 1. Filtering: the use of skip-scan or range-scan vs. full scan.
>> > 2. Ordering
>> >
>> > The second aspect is represented in Calcite by "collation" trait, which
>> can
>> > make a radical difference in logical planning. Replacing the original
>> table
>> > with one of its indices might end up changing the whole plan completely.
>> >
>> > I am not sure yet which stage the Phoenix materialization will
>> eventually
>> > go, but one certain thing is that it should be available for all the
>> > general optimizations to take effect.
>> >
>> >
>> > Thanks,
>> > Maryann
>> >
>> > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <asinha@maprtech.com>
>> wrote:
>> >
>> >> Catching up on this thread.  Jacques, if I understand correctly,  you
>> are
>> >> proposing that instead of the single point of initialization of rules
>> when
>> >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have more
>> >> entry points to plug into different phases of planning and storage
>> plugins
>> >> would register different sets of rules in these separate phases.   It
>> seems
>> >> fine to me (assuming that there are no side effects where we somehow
>> end up
>> >> increasing the search space for the existing plans).
>> >>
>> >> When talking about the Phoenix integration or the JDBC storage plugin,
>> I
>> >> am curious about which phase(s) would they register the rules for ?  I
>> >> believe Phoenix's materialized view usage in Calcite is actually for
>> >> secondary indexing, not for materialized views per se.  In that sense,
>> it
>> >> would seem to fit into physical planning phase rather than logical,
>> since
>> >> indexes are a faster physical access mechanism for a scan.  The logical
>> >> properties of a table don't change due to presence of an index.
>> >>
>> >> On the other hand, I think the JDBC plugin might register rules for
>> >> logical phase since  it would have filter and projection pushdowns
>> that do
>> >> change logical properties.
>> >>
>> >> Aman
>> >>
>> >>
>> >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <hgunes@maprtech.com>
>> wrote:
>> >>
>> >>> I would +1 (1-3) for sure. I do not have much understanding of
>> programs
>> >>> however additional flexibility for storage plugin devs sounds cool in
>> >>> general when used responsibly =) so +0 for (4)
>> >>>
>> >>>
>> >>> -H+
>> >>>
>> >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <jacques@dremio.com>
>> >>> wrote:
>> >>>
>> >>> > The dead air must mean that everyone is onboard with my
>> recommendation
>> >>> >
>> >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>> >>> >
>> >>> > interface PlannerIntegration{
>> >>> >   void initialize(Planner, Phase)
>> >>> > }
>> >>> >
>> >>> > Right :D
>> >>> >
>> >>> > --
>> >>> > Jacques Nadeau
>> >>> > CTO and Co-Founder, Dremio
>> >>> >
>> >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <jacques@dremio.com>
>> >>> wrote:
>> >>> >
>> >>> > > A number of us were meeting last week to work through integrating
>> the
>> >>> > > Phoenix storage plugin. This plugin is interesting because
it also
>> >>> uses
>> >>> > > Calcite for planning. In some ways, this should make integration
>> easy.
>> >>> > > However, it also allowed us to see certain constraints who
how we
>> >>> expose
>> >>> > > planner integration between storage plugins and Drill internals.
>> >>> > > Currently, Drill asks the plugin to provide a set of optimizer
>> rules
>> >>> > which
>> >>> > > it incorporates into one of the many stages of planning. This
is
>> too
>> >>> > > constraining in two ways:
>> >>> > >
>> >>> > > 1. it doesn't allow a plugin to decide which phase of planning
to
>> >>> > > integrate with. (This was definitely a problem in the Phoenix
>> case.
>> >>> Our
>> >>> > > hack solution for now is to incorporate storage plugin rules
in
>> phases
>> >>> > > instead of just one [1].)
>> >>> > > 2. it doesn't allow arbitrary transformations. Calcite provides
a
>> >>> program
>> >>> > > concept. It may be that a plugin needs to do some of its own
work
>> >>> using
>> >>> > the
>> >>> > > Hep planner. Currently there isn't an elegant way to do this
in
>> the
>> >>> > context
>> >>> > > of the rule.
>> >>> > > 3. There is no easy way to incorporate additional planner
>> >>> initialization
>> >>> > > options. This was almost a problem in the case of the JDBC
>> plugin. It
>> >>> > > turned out that a hidden integration using register() here
[2]
>> >>> allowed us
>> >>> > > to continue throughout the planning phases. However, we have
to
>> >>> register
>> >>> > > all the rules for all the phases of planning which is a bit
>> unclean.
>> >>> > We're
>> >>> > > hitting the same problem in the case of Phoenix where we need
to
>> >>> register
>> >>> > > materialized views as part of planner initialization but the
hack
>> from
>> >>> > the
>> >>> > > JDBC case won't really work.
>> >>> > >
>> >>> > > I suggest we update the interface to allow better support
for
>> these
>> >>> types
>> >>> > > of integrations.
>> >>> > >
>> >>> > > These seem to be the main requirements:
>> >>> > > 1. Expose concrete planning phases to storage plugins
>> >>> > > 2. Allow a storage plugin to provide additional planner
>> initialization
>> >>> > > behavior
>> >>> > > 3. Allow a storage plugin to provide rules to include a particular
>> >>> > > planning phase (merged with other rules during that phase).
>> >>> > > 4. (possibly) allow a storage plugin to provide transformation
>> >>> programs
>> >>> > > that are to be executed in between the concrete planning phases.
>> >>> > >
>> >>> > > Item (4) above is the most questionable to me as I wonder
whether
>> or
>> >>> not
>> >>> > > this could simply be solved by creating a transformation rule
(or
>> >>> program
>> >>> > > rule in Calcite's terminology) that creates an alternative
tree
>> and
>> >>> thus
>> >>> > be
>> >>> > > solved by (3).
>> >>> > >
>> >>> > > A simple solution might be (if we ignore #4):
>> >>> > >
>> >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>> >>> > >
>> >>> > > interface PlannerIntegration{
>> >>> > >   void initialize(Planner, Phase)
>> >>> > > }
>> >>> > >
>> >>> > > This way, a storage plugin could register rules (or materialized
>> >>> views)
>> >>> > at
>> >>> > > setup time.
>> >>> > >
>> >>> > > What do others think?
>> >>> > >
>> >>> > > [1]
>> >>> > >
>> >>> >
>> >>>
>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145
>> >>> > > [2]
>> >>> > >
>> >>> >
>> >>>
>> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119
>> >>> > >
>> >>> > > --
>> >>> > > Jacques Nadeau
>> >>> > > CTO and Co-Founder, Dremio
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message