hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hubert Zhang <hzh...@pivotal.io>
Subject Re: Planner invoked twice on master
Date Wed, 31 Aug 2016 05:14:20 GMT
Hi Kavinder,
If your problem "multiple REST requests" is really caused by planner
invoked twice,
my suggestions is to avoid the related function call(maybe a call to rest
service) at the
first plan stage.

You can refers to src/backend/optimizer/util/clauses.c:2834
using is_in_planning_phase() to determine whether the legacy planner is at
the first
or the second stage.


On Wed, Aug 31, 2016 at 12:31 AM, Lirong Jian <jianlirong@gmail.com> wrote:

> Hi Kavinder,
> If you are looking for some guy "to be blamed" for making the decision
> of invoking the planner twice on master, unfortunately, that guy would be
> me, :).
> First of all, I have left the company (Pivotal Inc.) about haft a year ago
> and was no longer working on the Apache HAWQ project full time anymore. And
> I have little time to keep tracking the code changes these days. Thus, what
> I am going to say is probably inconsistent with the latest code.
> * Problem*
> To improve the scalability and throughput of the system, we want to
> implement the dynamic resource allocation mechanism for HAWQ 2.0.  In other
> words, we want to allocate exactly needed resource (the number of virtual
> segments), rather than the fixed amount of resource (like HAWQ 1.x), to run
> the underlying query. In order to achieve that, we need to calculate the
> cost of the query before allocating the resource, which means we need to
> figure out the execution plan first. However, in the framework of the
> current planner (either the old planner or the new optimizer ORCA),
> the optimal execution plan is generated given the to-be-run query and the
> number of segments. Thus, this is a chicken and egg problem.
> *Solution*
> IMO, the ideal solution for the above problem is to use an iterative
> algorithm: given a default number of segments, calculate the optimal
> execution plan; based on the output optimal execution plan, figure out the
> appropriate number of segments needed to run this query; and calculate the
> optimal execution plan again, and again, until the result is stable.
> *Implementation*
> In the actual implementation, we set the number of iterations to 2 for two
> major reasons: (1) two iterations are enough to give out a good result; (2)
> there is some cost associated with invoking the planner, especially the new
> optimizer ORCA.
> After implementing the first version, we later found that determining the
> number of virtual segments based on the cost of the query sometimes gave
> out very bad results (although this is the issue of the planner, because
> the cost of the planner provided doesn't imply the actual running cost of
> the query correctly). So, borrowing the idea from Hadoop MapReduce, we
> calculate the cost based on the total size of all tables needed to be
> scanned for the underlying query. It seemed we don't need to invoke the
> planner before allocating resource anymore. However, in our current
> resource manager, the allocated resource is segment-based, not
> process-based. For example, if an execution plan consists of three slices,
> meaning we need to setup three processes on each segment to run this query.
> One allocated resource unit (virtual segment) is for all three processes.
> In order to avoid the case where too many processes are started on one
> physical host, we need to know how many processes (the number of slices of
> the execution plan) are going to start on one virtual segment when we
> require resource from the resource manager. Thus, the execution plan is
> still needed. We could write a new function to calculate number of slices
> of the plan rather than invoking the planner, but after some investigation,
> we found the the new function did almost the same thing as the planner. So,
> why bother writing more duplicated code?
> *Engineering Consideration*
> IMO, for the long term, maybe the best solution is to embed the logic
> of resource negotiation into the planner. In that case, the output of the
> planner consists of the needed number of virtual segments and the
> associated optimal execution plan. The planner can be invoked just once on
> master.
> However, back to that time, we decided to separate the functionalities of
> resource negation and planner completely. Although it may looks a little
> ugly from the architecture view, it saved us a lot of code refactoring
> effort and the communication cost among different teams. We did have a
> release deadline, :).
> Above is just my 2 cents.
> Best,
> Lirong
> Lirong Jian
> HashData Inc.
> 2016-08-30 1:42 GMT+08:00 Goden Yao <godenyao@apache.org>:
> > Some back ground info:
> >
> > HAWQ 2.0 right now doesn't do dynamic resource allocation for PXF queries
> > (External Table).
> > It was a compromise we made as PXF used to have its own allocation logic
> > and we didn't get a chance to converge the logic with HAWQ 2.0.
> > So to make it compatible (on performance) with 1.x HAWQ, the current
> logic
> > will assume external table queries need 8 segments per node to execute.
> > (e.g. if 3 nodes in the cluster, it'll need 24 segments).
> > If that allocation fails, the query will fail and user will see the error
> > message like "do not have sufficient resources" or "segments" to execute
> > the query.
> >
> > As I understand, the 1st call is to get fragment info, 2nd call is to
> > optimize allocation for fragments to segments based on the info got from
> > 1st call and generate the optimized plan.
> >
> > -Goden
> >
> > On Mon, Aug 29, 2016 at 10:31 AM Kavinder Dhaliwal <kdhaliwal@pivotal.io
> >
> > wrote:
> >
> > > Hi,
> > >
> > > Recently I was looking into the issue of PXF receiving multiple REST
> > > requests to the fragmenter. Based on offline discussions I have got a
> > rough
> > > idea that this is happening because HAWQ plans every query twice on the
> > > master. I understand that this is to allow resource negotiation that
> was
> > a
> > > feature of HAWQ 2.0. I'd like to know if anyone on the mailing list can
> > > give any more background into the history of the decision making behind
> > > this change for HAWQ 2.0 and whether this is only a short term solution
> > >
> > > Thanks,
> > > Kavinder
> > >
> >


Hubert Zhang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message