hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lirong Jian <jianlir...@gmail.com>
Subject Re: Planner invoked twice on master
Date Tue, 30 Aug 2016 16:31:02 GMT
Hi Kavinder,

If you are looking for some guy "to be blamed" for making the decision
of invoking the planner twice on master, unfortunately, that guy would be
me, :).

First of all, I have left the company (Pivotal Inc.) about haft a year ago
and was no longer working on the Apache HAWQ project full time anymore. And
I have little time to keep tracking the code changes these days. Thus, what
I am going to say is probably inconsistent with the latest code.

* Problem*

To improve the scalability and throughput of the system, we want to
implement the dynamic resource allocation mechanism for HAWQ 2.0.  In other
words, we want to allocate exactly needed resource (the number of virtual
segments), rather than the fixed amount of resource (like HAWQ 1.x), to run
the underlying query. In order to achieve that, we need to calculate the
cost of the query before allocating the resource, which means we need to
figure out the execution plan first. However, in the framework of the
current planner (either the old planner or the new optimizer ORCA),
the optimal execution plan is generated given the to-be-run query and the
number of segments. Thus, this is a chicken and egg problem.

*Solution*

IMO, the ideal solution for the above problem is to use an iterative
algorithm: given a default number of segments, calculate the optimal
execution plan; based on the output optimal execution plan, figure out the
appropriate number of segments needed to run this query; and calculate the
optimal execution plan again, and again, until the result is stable.

*Implementation*

In the actual implementation, we set the number of iterations to 2 for two
major reasons: (1) two iterations are enough to give out a good result; (2)
there is some cost associated with invoking the planner, especially the new
optimizer ORCA.

After implementing the first version, we later found that determining the
number of virtual segments based on the cost of the query sometimes gave
out very bad results (although this is the issue of the planner, because
the cost of the planner provided doesn't imply the actual running cost of
the query correctly). So, borrowing the idea from Hadoop MapReduce, we
calculate the cost based on the total size of all tables needed to be
scanned for the underlying query. It seemed we don't need to invoke the
planner before allocating resource anymore. However, in our current
resource manager, the allocated resource is segment-based, not
process-based. For example, if an execution plan consists of three slices,
meaning we need to setup three processes on each segment to run this query.
One allocated resource unit (virtual segment) is for all three processes.
In order to avoid the case where too many processes are started on one
physical host, we need to know how many processes (the number of slices of
the execution plan) are going to start on one virtual segment when we
require resource from the resource manager. Thus, the execution plan is
still needed. We could write a new function to calculate number of slices
of the plan rather than invoking the planner, but after some investigation,
we found the the new function did almost the same thing as the planner. So,
why bother writing more duplicated code?

*Engineering Consideration*

IMO, for the long term, maybe the best solution is to embed the logic
of resource negotiation into the planner. In that case, the output of the
planner consists of the needed number of virtual segments and the
associated optimal execution plan. The planner can be invoked just once on
master.

However, back to that time, we decided to separate the functionalities of
resource negation and planner completely. Although it may looks a little
ugly from the architecture view, it saved us a lot of code refactoring
effort and the communication cost among different teams. We did have a
release deadline, :).

Above is just my 2 cents.

Best,
Lirong

Lirong Jian
HashData Inc.

2016-08-30 1:42 GMT+08:00 Goden Yao <godenyao@apache.org>:

> Some back ground info:
>
> HAWQ 2.0 right now doesn't do dynamic resource allocation for PXF queries
> (External Table).
> It was a compromise we made as PXF used to have its own allocation logic
> and we didn't get a chance to converge the logic with HAWQ 2.0.
> So to make it compatible (on performance) with 1.x HAWQ, the current logic
> will assume external table queries need 8 segments per node to execute.
> (e.g. if 3 nodes in the cluster, it'll need 24 segments).
> If that allocation fails, the query will fail and user will see the error
> message like "do not have sufficient resources" or "segments" to execute
> the query.
>
> As I understand, the 1st call is to get fragment info, 2nd call is to
> optimize allocation for fragments to segments based on the info got from
> 1st call and generate the optimized plan.
>
> -Goden
>
> On Mon, Aug 29, 2016 at 10:31 AM Kavinder Dhaliwal <kdhaliwal@pivotal.io>
> wrote:
>
> > Hi,
> >
> > Recently I was looking into the issue of PXF receiving multiple REST
> > requests to the fragmenter. Based on offline discussions I have got a
> rough
> > idea that this is happening because HAWQ plans every query twice on the
> > master. I understand that this is to allow resource negotiation that was
> a
> > feature of HAWQ 2.0. I'd like to know if anyone on the mailing list can
> > give any more background into the history of the decision making behind
> > this change for HAWQ 2.0 and whether this is only a short term solution
> >
> > Thanks,
> > Kavinder
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message