hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From huor <...@git.apache.org>
Subject [GitHub] incubator-hawq pull request: HAWQ-372. Fix single row insert and C...
Date Mon, 01 Feb 2016 02:50:29 GMT
GitHub user huor opened a pull request:


    HAWQ-372. Fix single row insert and COPY hang in high concurrent workloads

    Root cause analysis shows that the hang of concurrent workload is in three folds:
    1. Most of the queries lock relations at first and then allocate resource for it, while
analyze (either manually or automatically triggered) allocate resource at first and then lock
relation. This may lead to deadlock between relation lock and query resource with concurrent
    2. Some of the queries may do query resource allocation multiple times, i.e., SRI/COPY
which triggers automatic statistics collection. They do not return query resource as soon
as some of the sub tasks are done for the query.
    For example, SRI allocate resource for insert itself, do insertion, return query resource
for insertion, allocate resource for automatically triggered analyze, do analyze, return query
resource for analyze; while COPY allocate resource for COPY itself, do COPY, allocate resource
for automatically triggered analyze, do analyze, return query resource for analyze, return
query resource for COPY. This may lead to a lot of query resource for COPY itself is occupied,
and they still try to allocate more resource for analyze. Thus, it makes some of the COPY
pending to allocate resource for analyze, which seems like "deadlock" on query resource.
    3. SRI/COPY do query resource allocation multiple times, while TPC-H do resource allocation
only once. Usually TPC-H queries take longer time to complete. Thus, SRI/COPY maybe run in
halfway and do second resource allocation for some of the sub-tasks. If meanwhile all resource
are busy, SRI/COPY need to wait for TPC-H queries to return resource and then proceed. As
a consequence, SRI/COPY run very slow or even hang in user's standpoint.
    For address the issue, we do following fix:
    For 1, make sure all queries (especially insert queries, create/alter/drop database object
queries) follows the pattern that lock relation at first, and then allocate query resource.
    For 2, make sure queries follows the pattern that allocate query resource for sub-task1,
return query resource for sub-task1, ..., allocate query resource for sub-taskN, return query
resource for sub-taskN
    For 3, from user practice, separate different workloads in different resource queues,
i.e., SRI/COPY in one load queue, while TPC-H in query queue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huor/incubator-hawq huor_sri_copy

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #307
commit 69827d9e9bdf2edddbeda4003ba821f7ff956d6b
Author: Ruilong Huo <rhuo@pivotal.io>
Date:   2016-01-28T08:37:53Z

    Fix query hang in concurrent SRI/COPY

commit c99cd1c2ccd17d29a81a3ce2ea464afd8389c1fc
Author: Ruilong Huo <rhuo@pivotal.io>
Date:   2016-01-29T07:20:30Z

    Fix query hang in concurrent SRI/COPY - continue

commit 7849218b4ef5e541a2a050c4d43f4224c45cdf46
Author: Ruilong Huo <rhuo@pivotal.io>
Date:   2016-02-01T02:09:17Z

    Fix query hang in concurrent SRI/COPY - continue 2


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message