hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi JIN <jinyi.apa...@gmail.com>
Subject Re: Lock is retained on a table
Date Fri, 22 Sep 2017 05:44:33 GMT
Thanks, Grant.

Yes, if that process does not exit, the lock is impossible to be released
as expected till that process decides to release by itself, so it is better
to use pg_cancel_backend(pid) to try to clean it from another session.
Seems the problem is the process does not know the client (a JDBC) has
gone. When JIRA is created, maybe someone knows similar thing or can check
and try to reproduce it to judge what the problem is. To be honest, this is
the first time I hear of this issue.

Using kill -9 can cause processes reset because they share a block of
memory, if one process is killed -9 without a chance to clean up its own
data, the other ones cannot guarantee the data in shared memory are clean
and valid.

Best,
Yi

On Fri, Sep 22, 2017 at 3:14 PM, Grant Krieger <GKrieger@mrpg.com> wrote:

> Hi Yi,
>
>
>
> Thanks for your help.
>
>
>
> Yes we did check the master and the process exists. We could only kill
> with -9 but this seemed to kill most of our other sessions as well.
>
>
>
> Yes I will move this to Jira
>
>
>
> I will pstack and provide you with an update
>
>
>
> Thanks
>
>
>
> Grant
>
>
>
> *From:* Yi JIN [mailto:jinyi.apache@gmail.com]
> *Sent:* 21 September 2017 10:51 PM
>
> *To:* user@hawq.incubator.apache.org
> *Subject:* Re: Lock is retained on a table
>
>
>
> BTW, Grant, do you think it suitable to move this issue to JIRA for
> discussing and tracking?
>
>
>
> Best,
>
> Yi
>
>
>
> On Fri, Sep 22, 2017 at 6:46 AM, Yi JIN <jinyi.apache@gmail.com> wrote:
>
> Hi Grant,
>
>
>
> Did you check on master if the working process exists? For example, ps -ef
> | grep postgres | grep 393937
>
>
>
> If you can find the pid in pg_locks, I think the process should still be
> there.  So maybe you can pstack that process to check the stack, maybe we
> can find more hints.
>
>
>
> Best,
>
> Yi
>
>
>
> On Thu, Sep 21, 2017 at 7:50 PM, Grant Krieger <GKrieger@mrpg.com> wrote:
>
> Hi All,
>
>
>
> Just to add.
>
>
>
> It appears lock still held but there are no active querys running on table
>
>
>
> select * from pg_stat_activity
>
> where procpid = 335492
>
>
>
>
>
> 1075919                edw_performance3       335492  342
> 10           gpadmin              drop table public.test_prod_dim3
> true       2017/09/21 10:51:22 AM               2017/09/21 8:58:55 AM
> 10.1.3.201            61257                    2017/09/21 10:51:22 AM
> false
>
>
>
> select * from pg_stat_activity
>
> where current_query like '%test_prod_dim3%'
>
>
>
>
>
> 1075919                edw_performance3       335492  342
> 10           gpadmin              drop table public.test_prod_dim3
>
> true       2017/09/21 10:51:22 AM               2017/09/21 8:58:55
> AM                10.1.3.201            61257                    2017/09/21
> 10:51:22 AM               false
>
> 1075919                edw_performance3       376465  550
> 10           gpadmin              select * from pg_stat_activity where
> current_query like '%test_prod_dim3%'               false      2017/09/21
> 11:24:51 AM               2017/09/21 10:14:33 AM
> 10.1.3.201            62030                    2017/09/21 11:24:51
> AM               false
>
>
>
> The problem seems to exist when we have pids in pg_locks which don’t exist
> in pg_stat_activity. See query below
>
>
>
> select b.relname,a.* from pg_locks a
>
> join pg_class b
>
> on a.relation = b.relfilenode
>
> where pid not in
>
> (
>
> select procpid from pg_stat_activity
>
> )
>
> and b.relname = 'test_prod_dim3'
>
>
>
> test_prod_dim3               relation 1075919
> 1710089                (null)     (null)     (null)     (null)
> (null)     (null)                109830069           393937
> AccessShareLock              true       647         false      -10000
>
>
>
>
>
> So this dont work
>
>
>
> SELECT pg_terminate_backend(393937)
>
>
>
>
>
> Thanks
>
>
>
>
>
> *From:* Grant Krieger
> *Sent:* 21 September 2017 11:01 AM
> *To:* user@hawq.incubator.apache.org
> *Subject:* RE: Lock is retained on a table
>
>
>
> Hi Guys,
>
>
>
> Thanks all for your help.
>
>
>
> I have scanned the log and there is no fatal message.
>
>
>
> On simulation via psql via putty there seems to be no problem however it
> appears to exist when connecting via JDBC. If I close the JDBC client via
> Aqua data studio on the last select statement the lock remains. Maybe it’s
> a JDBC setting I need to change.
>
>
>
> Below is a test script which loads data then a test query
>
>
>
> --- loading dummy data
>
>
>
> DROP
>
>       TABLE if EXISTS PUBLIC.test_prod_dim3;
>
> CREATE
>
>       TABLE PUBLIC.test_prod_dim3
>
>       (
>
>             productid int4 NULL,
>
>             description text NULL
>
>       )
>
>       WITH
>
>       (
>
>             APPENDONLY = TRUE,
>
>             COMPRESSTYPE = SNAPPY,
>
>             OIDS = FALSE
>
>       )
>
>       DISTRIBUTED RANDOMLY;
>
> INSERT
>
> INTO
>
>       PUBLIC.test_prod_dim3
>
> SELECT
>
>       1       AS productid,
>
>       '1 desc' AS description
>
> FROM
>
>       generate_series(1,100000000);
>
> DROP
>
>       TABLE if EXISTS PUBLIC.test_table_fact3;
>
> CREATE
>
>       TABLE PUBLIC.test_table_fact3
>
>       (
>
>             productid int4 NULL,
>
>             value1 int4 NULL
>
>       )
>
>       WITH
>
>       (
>
>             APPENDONLY = TRUE,
>
>             COMPRESSTYPE = SNAPPY,
>
>             OIDS = FALSE
>
>       )
>
>       DISTRIBUTED RANDOMLY;
>
> INSERT
>
> INTO
>
>       PUBLIC.test_table_fact3
>
> SELECT
>
>       1 AS productid,
>
>       1 AS value1
>
> FROM
>
>       generate_series(1,100000000);
>
>       --End loading dummy data
>
>
>
>
>
>       --Starting test query. Note this only works when I close JDBC client
> with CTRL ALT DELETE when this step is running
>
>
>
> SELECT
>
>       *
>
> FROM
>
>       (     SELECT
>
>                   a.*,
>
>                   b.description
>
>             FROM
>
>                   PUBLIC.test_table_fact3 a
>
>                   JOIN PUBLIC.test_prod_dim3 b
>
>                   ON a.productid = b.productid ) AS q LIMIT 10000;
>
>
>
>
>
> --Then in a new session
>
>
>
> drop table public.test_prod_dim3;
>
>
>
> select * from pg_stat_activity
>
> where procpid = 335492
>
>
>
> 1075919     edw_performance3  335492      342   10    gpadmin     drop
> table public.test_prod_dim3   true  2017/09/21 10:51:22 AM  2017/09/21
> 8:58:55 AM   10.1.3.201      61257       2017/09/21 10:51:22 AM  false
>
>
>
> select * from pg_locks
>
> where pid = 335492
>
>
>
> relation    1075919     1710089     (null)      (null)      (null)
> (null)      (null)      (null)      109830156   335492
> AccessExclusiveLock     false 342      false -10000
>
> transactionid     (null)      (null)      (null)      (null)
> 109830156   (null)      (null)      (null)      109830156   335492
> ExclusiveLock     true  342      false -10000
>
> relation    1075919     1259  (null)      (null)      (null)
> (null)      (null)      (null)      109830156   335492
> RowExclusiveLock  true  342   false -10000
>
> relation    1075919     2608  (null)      (null)      (null)
> (null)      (null)      (null)      109830156   335492
> RowExclusiveLock  true  342   false -10000
>
> relation    1075919     1247  (null)      (null)      (null)
> (null)      (null)      (null)      109830156   335492
> RowExclusiveLock  true  342   false -10000
>
>
>
>
>
>
>
> *From:* Shubham Sharma [mailto:ssharma@pivotal.io <ssharma@pivotal.io>]
> *Sent:* 20 September 2017 11:52 PM
> *To:* user@hawq.incubator.apache.org
> *Subject:* Re: Lock is retained on a table
>
>
>
> Hi Grant,
>
>
>
> Will it be feasible for you to share a reproducible case where you see
> this behavior. I have tried the following and don't see the behavior as you
> decribed
>
>
>
> [gpadmin@hdp3 ~]$ cat simple_select_query.sql
>
> select * from t1
>
> psql -c "select * from pg_locks";
>
>    locktype    | database | relation | page | tuple | transactionid |
> classid | objid | objsubid | transaction |  pid   |      mode       |
> granted | mppsessionid | mppiswriter | gp_segment_id
>
> ---------------+----------+----------+------+-------+-------
> --------+---------+-------+----------+-------------+--------
> +-----------------+---------+--------------+-------------+---------------
>
>  transactionid |          |          |      |       |       1639719 |
>     |       |          |     1639719 | 748151 | ExclusiveLock   | t       |
>         1720 | f           |        -10000
>
>  relation      |  1311369 |    10335 |      |       |               |
>     |       |          |     1639719 | 748151 | AccessShareLock | t       |
>         1720 | f           |        -10000
>
> (2 rows)
>
>
>
> [gpadmin@hdp3 ~]$ nohup psql -f simple_select_query.sql &
>
> [1] 748153
>
> [gpadmin@hdp3 ~]$ kill -9 748153
>
>
>
> [gpadmin@hdp3 ~]$ psql -c "select * from pg_locks";
>
>    locktype    | database | relation | page | tuple | transactionid |
> classid | objid | objsubid | transaction |  pid   |      mode       |
> granted | mppsessionid | mppiswriter | gp_segment_id
>
> ---------------+----------+----------+------+-------+-------
> --------+---------+-------+----------+-------------+--------
> +-----------------+---------+--------------+-------------+---------------
>
>  relation      |  1311369 |    10335 |      |       |               |
>     |       |          |     1639725 | 748410 | AccessShareLock | t       |
>         1723 | f           |        -10000
>
>  transactionid |          |          |      |       |       1639725 |
>     |       |          |     1639725 | 748410 | ExclusiveLock   | t       |
>         1723 | f           |        -10000
>
> (2 rows)
>
>
>
> I don't see a dangling exclusive lock. Is that something similar what you
> mentioned in your approach.
>
>
>
> On Wed, Sep 20, 2017 at 6:10 AM, Yi JIN <jinyi.apache@gmail.com> wrote:
>
> Hi Grant,
>
>
>
> Most likely it is a bug. Can you have a look the master log files, to
> check if there are some FATAL logs? And could you extract those FATAL lines
> out ?
>
>
>
> Best,
>
> Yi
>
>
>
> On Wed, Sep 20, 2017 at 10:18 PM, Grant Krieger <GKrieger@mrpg.com> wrote:
>
> Hi ,
>
>
>
> I have come across a funny in hawq 2.2.0.0. If we do a long running simple
> select only query on a table and we illegally terminate the client session
> (Just close the query tool window without cancelling) an exclusive lock is
> held on the underlying table until a server restart is done.
>
> This is even when the query has completed on the server.
>
> The terminate backend or Linux kill has no effect
>
>
>
> Any ideas?
>
>
>
> Thanks
>
>
>
> Grant Krieger
>
>
>
> [image: Mr PriceGroup Limited]
>
> *Grant Krieger *
> *Portfolio Lead *
> mrp IT
> *T:* +27313108478 <+27%2031%20310%208478>
> *E:* gkrieger@mrpg.com
> *W:* http://www.mrpricegroup.com
>
>
>
> The information contained in this communication from gkrieger@mrpg.com is
> confidential and may be legally privileged. This email is intended solely
> for use by the addressee and others authorized to receive it, and should
> not be read if delivered in error. If you are not the addressee you are
> hereby notified that any disclosure, copying, distribution or action taken
> or omitted in reliance of the contents of this information is strictly
> prohibited and may be unlawful. Please notify the sender immediately if the
> e-mail has unintentionally reached you. The email legal notice that governs
> the transmission of this electronic communication is available online at Mr
> Price Group Limited (http://www.mrpricegroup.com/LegalNotice.aspx). If
> you cannot access the legal notice, kindly contact us on +27 800 21 2535
> and a copy will be provided to you.
>
> Mr Price Group is an Authorised Financial Services and Credit Provider:
> FSP31450 and NCRCP46.
>
> PPlease consider the environment before printing this email.
>
>
>
>
>
>
>
> --
>
> Regards,
>
> Shubham Sharma
>
> Staff Customer Engineer
>
> Pivotal Global Support Services
>
> ssharma@pivotal.io
>
> Direct Tel: +1(510)-304-8201 <(510)%20304-8201>
>
> Office Hours: Mon-Fri 9:00 am to 5:00 pm PDT
>
> Out of Office Hours Contact +1 877-477-2269 <(877)%20477-2269>
>
>
>
>
>

Mime
View raw message