hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Disk hot swap for data node while hbase use short-circuit
Date Sun, 02 Jun 2019 01:45:37 GMT
Reminds me of https://issues.apache.org/jira/browse/HBASE-21915 too. 
Agree with Wei-Chiu that I'd start by ruling out HDFS issues first, and 
then start worrying about HBase issues :)

On 6/1/19 8:05 PM, Wei-Chiu Chuang wrote:
> I think i found a similar bug report that matches your symptom: HDFS-12204
> <https://issues.apache.org/jira/browse/HDFS-12204> (Dfsclient Do not close
> file descriptor when using shortcircuit)
> 
> On Wed, May 29, 2019 at 11:37 PM Kang Minwoo <minwoo.kang@outlook.com>
> wrote:
> 
>> I think these file opened for reads. because that block is finalized.
>>
>> ---
>> ls -al /proc/regionserver_pid/fd
>> 902 -> /data_path/current/finalized/~/blk_1 (deleted)
>> 946 -> /data_path/current/finalized/~/blk_2 (deleted)
>> 947 -> /data_path/current/finalized/~/blk_3.meta (deleted)
>> ---
>>
>> I think it is not an HBase bug. This is because DFSClient checks stale fd
>> when the fetch method invoked.
>>
>> Best regards,
>> Minwoo Kang
>>
>> ________________________________________
>> 보낸 사람: Wei-Chiu Chuang <weichiu@cloudera.com.INVALID>
>> 보낸 날짜: 2019년 5월 29일 수요일 20:51
>> 받는 사람: user@hbase.apache.org
>> 제목: Re: Disk hot swap for data node while hbase use short-circuit
>>
>> Do you have a list of files that was being opened? I'd like to know if
>> those are files opened for writes or for reads.
>>
>> If you are on the more recent version of Hadoop (2.8.0 and above),
>> there's a HDFS command to interrupt ongoing writes to DataNodes (HDFS-9945
>> <https://issues.apache.org/jira/browse/HDFS-9945>)
>>
>>
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin
>> hdfs dfsadmin -evictWriters
>>
>> Looking at HDFS hotswap implementation, it looks like DataNode doesn't
>> interrupt writers when a volume is removed. That sounds like a bug.
>>
>> On Tue, May 28, 2019 at 9:39 PM Kang Minwoo <minwoo.kang@outlook.com>
>> wrote:
>>
>>> Hello, Users.
>>>
>>> I use JBOD for data node. Some times the disk in the data node has a
>>> problem.
>>>
>>> The first time, I shut down all instance include data node and region
>>> server in the machine that has a disk problem.
>>> But It is not a good solution. So I improve the process.
>>>
>>> When I detect disk problem in the server. I just perform disk hot swap.
>>>
>>> But System administrator complains of some FD that still open so they
>>> cannot remove the disk.
>>> Regionserver has an FD, I use short circuit reads feature. (HBase version
>>> 1.2.9)
>>>
>>> When we first met this issue, we force unmount disk and remount.
>>> But after this process, kernel report error[1].
>>>
>>> So we avoid this issue. purge stale FD.
>>>
>>> I think this issue is common.
>>> I want to know how hbase-users deal with this issue.
>>>
>>> Thank you very much for sharing your experience.
>>>
>>> Best regards,
>>> Minwoo Kang
>>>
>>> [1]:
>>>
>> https://www.thegeekdiary.com/xfs_log_force-error-5-returned-xfs-error-centos-rhel-7/
>>>
>>
> 

Mime
View raw message