hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HanRyong,Jung (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (HBASE-18454) RegionServer Do not close file descriptor when using shortcircuit
Date Tue, 29 Aug 2017 12:51:02 GMT

     [ https://issues.apache.org/jira/browse/HBASE-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

HanRyong,Jung reopened HBASE-18454:
-----------------------------------

RefCount of Hdfs ShortCircuitReplica has an initial value of 2
That's because one is ShortCircuitCache, and one is HDFS BlockReaderLocal.
The problem I found here is that both hdfs and hbase need to be modified.
First, the ShortCircuitCacheCleaner of hdfs-client reports only the expireTime to purge(delete)
the cache
However, ShortCircuitReplica has a Slot and I need the code to Pugrge (delete) it via Slot.
Secondly, It is lazy to check the status of HDFS client BlockReaderLocal in hbase.
So even if you purged the cache in ShortCircuitCacheCleaner, the refCount of the hdfs client
is fixed to 1 if there is no access to the hfile.
I need to periodically check and close BlockReaderLocal on the HDFS client in Hbase.

I have added the following code to ShortCircuitCacheCleaner to solve this problem.
This solution is only available in hbase and is a very temporary fix.

> RegionServer Do not close file descriptor when using shortcircuit
> -----------------------------------------------------------------
>
>                 Key: HBASE-18454
>                 URL: https://issues.apache.org/jira/browse/HBASE-18454
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.2.6
>         Environment: HDFS 2.7.3, HBASE 1.2.6, centOS 6.8
>            Reporter: HanRyong,Jung
>
> I am a user using HDFS 2.7.3, HBASE 1.2.6, centOS 6.8.
> The regionserver uses 11 hard disks(jbod) and uses the hbase short circuit.
> At this time, when one disk fails in HDFS, I found a phenomenon that I did a hotswap
but did not close file descriptor in hbase.
> And the fd path on the umount disk is changed to an incorrect path.
> If I check /proc/regionserver_pid/fd, if I used /data1/volumn and umounted data1, the
path changed to /volumn.
> And many file descriptors used in shortcircuit are in the delete state.
> example ) 
> ls -al /proc/regionserver_pid/fd 
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 946 -> /data8/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir21/blk_1215239490
(deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 947 -> /data8/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir21/blk_1215239490_141511919.meta
(deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 948 -> /data7/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir27/blk_1215241080
(deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 949 -> /data7/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir27/blk_1215241080_141513509.meta
(deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 902 -> */volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir244/subdir160/blk_1257545757
(deleted)*
>                                                      .
>                                                      .
>                                                      .
>                                                      .
>                                                      
> when data4 fails, execute fuser)
> /sbin/fuser -cu /data4
> Cannot stat file /proc/regionserver_pid/fd/*192*: input/output error
> Cannot stat file /proc/regionserver_pid/fd/1282: input/output error
> Cannot stat file /proc/regionserver_pid/fd/1283: input/output error
>                                                      .
>                                                      .
>                                                      .
>                                                      .
>                                                      .
>                                                      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message