hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shujie Zhang <shzh...@pivotal.io>
Subject Re: a vectorized execution design document
Date Thu, 15 Mar 2018 09:00:41 GMT
Hi,


In past few days when I research how to implement the vectorized executor
of HAWQ, there is one solution like your advice, we can implement a
vectorized data type like this:

typedef struct vector
{
    int len;
    Oid baseType;
    Datum values[];
}

I found two problems with it:

1. when we implement the operator of this type,  it can be added only one
function for an operator, but we have to use a big switch ... case ... for
the different case, Compared to implement more vtype for each type, It is
not flexible.
for example, this is a function for + operator:

Datum vtype_vtype_pl(v1, v2)
{
    switch(v1.basetype)
    {
        switch(v2.baseType)
        {
               vint2vint2pl();
               vint2vint4pl();
               ……
        }
    }
}

2. Another problem is when we check an expression if it can be vectorized,
we have to check all the Var and functions if they have a
vectorized version, but it has some difficult to check.
for example, if the operator funciton is int2int2pl, we want to check if it
has a vectorized version, now we can know there is a function is
vtype_vtype_pl, but we don't know whether the vint2vint2pl is implemented
in the vtype_vtype_pl, although we can create a map to this situation, it
does not seem good.

I also don't suppose to create vtype for each type is good, it leaves more
complex to users, it also maybe creates HUGE metadata in the system tables
and lead to performance degradation, so it should keep to an continual
improvement.

Thank you, Kuien.

Zhang Shujie






On Thu, Mar 15, 2018 at 3:10 PM, 刘奎恩(局外) <kuien.lke@alibaba-inc.com> wrote:

> My two cents on VType: may we import a general vector type, for example,
> TuplesView, as the base data unit/structure (with a set of tuples, e.g.,
> 1024 tuples specified by a GUC value) for vectorized operators?  It is
> similar to the Set of Record but used by executors. Then we may not need to
> create VType for each Type.
>
>
> -------------——
> Kuien Liu/奎恩
>
> ------------------------------------------------------------------
> 发件人:刘奎恩(局外) <kuien.lke@alibaba-inc.com>
> 发送时间:2018年3月1日(星期四) 15:19
> 收件人:dev <dev@hawq.incubator.apache.org>; Shujie Zhang <shzhang@pivotal.io>
> 主 题:回复:a vectorized execution design document
>
> Thanks to Shujie for helpful reply.  Yes, it is transparent to upper
> logics which following Volcano model to evaluate cost and generate plan.
> When we finish the vectorization work (mostly), we may seek for a
> Vectorization-aware QO, with consider Bach-a-time, or Operatior-a-time,
> rather than Tuple-a-time.
>
>
> -------------——
> Kuien Liu/奎恩
>
> ------------------------------------------------------------------
> 发件人:Shujie Zhang <shzhang@pivotal.io>
> 发送时间:2018年2月27日(星期二) 09:41
> 收件人:dev <dev@hawq.incubator.apache.org>; 刘奎恩(局外) <
> kuien.lke@alibaba-inc.com>
> 主 题:Re: a vectorized execution design document
>
> Hi,
>
> We check the plan node to see if it can be vectorized when the Plan has
> been generated,
>
> In this phase, the only cheapest Plan had been selected, so we have no
> chance to change it.
>
>
> If we want to generate the vectorized Plan in the optimizer, we should
> generate
>
>  the vectorized Path and compute the cost of it, then we can compare with
> both the cost of them
>
> and choose the cheaper one, the trouble is both build-in-optimizer and
> ORCA should
>
> be refactored, it is a complex work:).  Another trouble is that the
> solution space of optimizer
>
> would become larger becuase of adding a new type Path, the planning time
> should be controlled.
>
>
> In this design, we change the Plan after it was generated,  it is
> transparent to upper modules,
>
> so the optimizer is also can be changed to fit the current vectorized Plan
> in the future.
>
> Thanks,
> Zhang Shujie
>
> On Mon, Feb 26, 2018 at 3:01 PM, 刘奎恩(局外) <kuien.lke@alibaba-inc.com>
> wrote:
> Nice doc, clear design. It is a good start ! I saw an example
> on aggregation is illustrated during the doc, we may implement more
> operators with this design, for example, SORT, JOIN.
> One question is: we implement vectorization under plan three, that is, the
> optimizer cannot feel the change in this way, it still estimates overall
> cost like
> ' total_cost = startup_cost + cpu_per_tuple * tuples + seq_page_cost *
> pages 'In my opinion, the second part (CPU costs) changes a lot, so it is
> should be a stage design, any further plan on it?
> -------------——
> Kuien Liu/奎恩
> ------------------------------------------------------------------发件人:Shujie
> Zhang <shzhang@pivotal.io>发送时间:2018年2月9日(星期五) 16:35收件人:dev
<
> dev@hawq.incubator.apache.org>主 题:a vectorized execution design document
> Hi,
>
> A vectorized execution design document have been uploaded
> to the issue#1450:
> https://issues.apache.org/jira/browse/HAWQ-1450
>
> Inside the document are a lot of ideas about how to implement a vectorized
> executor, We welcome any comments on the content and suggestions for
> improvement, thanks.
>
> Zhang Shujie
> 2018-02-09
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message