hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harald Bögeholz (JIRA) <j...@apache.org>
Subject [jira] [Created] (HAWQ-1498) Segments keep open file descriptors for deleted files
Date Thu, 06 Jul 2017 03:05:00 GMT
Harald Bögeholz created HAWQ-1498:
-------------------------------------

             Summary: Segments keep open file descriptors for deleted files
                 Key: HAWQ-1498
                 URL: https://issues.apache.org/jira/browse/HAWQ-1498
             Project: Apache HAWQ
          Issue Type: Bug
            Reporter: Harald Bögeholz
            Assignee: Radar Lei
             Fix For: 2.2.0.0-incubating


I have been running some large computations in HAWQ using psql on the master. These computations
created temporary tables and dropped them again. Nevertheless free disk space in HDFS decreased
by much more than it should. While the psql session on the master was still open I investigated
on one of the slave machines.
HDFS is stored on /mds:

{noformat}
[root@mds-hdp-04 ~]# ls -l /mds
total 36
drwxr-xr-x. 3 root      root    4096 Jun 14 04:23 falcon
drwxr-xr-x. 3 root      root    4096 Jun 14 04:42 hdfs
drwx------. 2 root      root   16384 Jun  8 02:48 lost+found
drwxr-xr-x. 5 storm     hadoop  4096 Jun 14 04:45 storm
drwxr-xr-x. 4 root      root    4096 Jun 14 04:43 yarn
drwxr-xr-x. 2 zookeeper hadoop  4096 Jun 14 04:39 zookeeper
[root@mds-hdp-04 ~]# df /mds
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/vdc       515928320 314560220 175137316  65% /mds
[root@mds-hdp-04 ~]# du -s /mds
89918952	/mds
{noformat}
Note that there is a more than 200 GB difference between the disk space used according to
df and the sum of all files on that file system according to du.
I have found the culprit to be several postgres processes running as gpadmin and holding open
file descriptors to deleted files. Here are the first few:

{noformat}
[root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
postgres 665334 gpadmin   18r   REG 253,32 134217728     0  9438234 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482
(deleted)
postgres 665334 gpadmin   34r   REG 253,32     24488     0  9438114 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398
(deleted)
postgres 665334 gpadmin   35r   REG 253,32       199     0  9438115 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta
(deleted)
postgres 665334 gpadmin   37r   REG 253,32 134217728     0  9438208 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446
(deleted)
postgres 665334 gpadmin   38r   REG 253,32   1048583     0  9438209 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta
(deleted)
postgres 665334 gpadmin   39r   REG 253,32   1048583     0  9438235 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta
(deleted)
postgres 665334 gpadmin   40r   REG 253,32 134217728     0  9438262 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555
(deleted)
postgres 665334 gpadmin   41r   REG 253,32   1048583     0  9438263 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta
(deleted)
postgres 665334 gpadmin   42r   REG 253,32 134217728     0  9438285 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602
(deleted)
postgres 665334 gpadmin   43r   REG 253,32   1048583     0  9438286 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta
(deleted)
{noformat}
As soon I close the psql session on the master the disk space is freed on the slaves:

{noformat}
[root@mds-hdp-04 ~]# df /mds
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/vdc       515928320 89992720 399704816  19% /mds
[root@mds-hdp-04 ~]# du -s /mds
89918952	/mds
[root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
{noformat}

I believe this to be a bug. At least for me it looks like a very undesirable behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message