hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kang Minwoo <minwoo.k...@outlook.com>
Subject RE: Region server dies at regular intervals for unknown reasons.
Date Fri, 10 Feb 2017 01:59:12 GMT
Here are logs.


[{HOST1} Datanode Log]
2017-02-08 11:08:10,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for
{BLOCK1}
2017-02-08 11:08:10,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
{BLOCK1}, type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted.
2017-02-08 11:08:10,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
{BLOCK1}, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2017-02-08 11:08:10,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
{BLOCK1} received exception java.io.IOException: Premature EOF from inputStream
2017-02-08 11:08:10,146 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: {HOST1}:{PORT1}:DataXceiver
error processing WRITE_BLOCK operation  src: /{IP1}:{PORT2} dst: /{IP1}:{PORT1}

[{HOST1} RegionServer Log]
2017-02-08 11:08:05,023 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Client tried
to access missing scanner
2017-02-08 11:08:07,617 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver.periodicFlusher
requesting flush for region {table} after a delay of {N}
2017-02-08 11:08:07,618 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver.periodicFlusher
requesting flush for region {table} after a delay of {N}
2017-02-08 11:08:11,831 INFO org.apache.zookeeper.ZooKeeper: Client environment:...
    ...
    (zookeeper client environment log)
    ...
2017-02-08 11:08:11,834 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection
2017-02-08 11:08:11,871 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to
server
2017-02-08 11:08:11,875 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
2017-02-08 11:08:11,880 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete

[Master Log]
2017-02-08 11:08:11,878 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
ephemeral node deleted, processing expiration [{HOST1}]
2017-02-08 11:08:12,442 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Splitting logs for {HOST1} before assignment
2017-02-08 11:08:12,444 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog
workers [{HOST1}]
2017-02-08 11:08:12,445 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting
{N} logs in [hdfs://a/b/WALs/{HOST1}]
    ...
    (SplitLogManager log)
    ...


Yes, I'll check it out.
Thanks.

________________________________
보낸 사람: Ted Yu <yuzhihong@gmail.com>
보낸 날짜: 2017년 2월 9일 목요일 오후 10:43:58
받는 사람: user@hbase.apache.org
제목: Re: Region server dies at regular intervals for unknown reasons.

Can you pastebin relevant logs from region server and master around this time ?

Please also check hdfs health.

> On Feb 9, 2017, at 3:44 AM, Kang Minwoo <minwoo.kang@outlook.com> wrote:
>
> The DataNode caused an java.io.IOException: Premature EOF from inputStream error.
>
> This error seems to have killed the region server.
>
> One second after this error
>
> I found Error log on the master server.
>
> RegionServerTracker: RegionServer ephemeral node deleted
>
> Thanks
>
>
> ________________________________
> 보낸 사람: Ted Yu <yuzhihong@gmail.com>
> 보낸 날짜: 2017년 2월 8일 수요일 오후 12:21:11
> 받는 사람: user@hbase.apache.org
> 제목: Re: Region server dies at regular intervals for unknown reasons.
>
> You can search in master log for the region backward.
> The log would tell you which region server last tried to open it.
>
> Pastebin the relevant snippet of region server log pertaining to the
> attempted open of the region.
>
> Thanks
>
>> On Tue, Feb 7, 2017 at 7:17 PM, Kang Minwoo <minwoo.kang@outlook.com> wrote:
>>
>> Yes. I agree with you.
>> But I can not upgrade right away.
>>
>> The problem is that region servers that have received a particular region
>> continue to die.
>> I got the name of that region.
>>
>> What would I do to find out if a region server dies when it receives a
>> region?
>>
>> Thanks.
>> ________________________________
>> 보낸 사람: Ted Yu <yuzhihong@gmail.com>
>> 보낸 날짜: 2017년 2월 7일 화요일 오전 11:38:23
>> 받는 사람: user@hbase.apache.org
>> 제목: Re: Region server dies at regular intervals for unknown reasons.
>>
>> 0.96 was so old.
>>
>> Please consider upgrading.
>>
>> You can do rolling upgrade to 0.98 / 1.x releases (e.g. 1.1.8 or 1.3.0).
>>
>> On Mon, Feb 6, 2017 at 6:04 PM, Kang Minwoo <minwoo.kang@outlook.com>
>> wrote:
>>
>>> The version I use is very low.
>>>
>>> hbase: 0.96.2
>>> hadoop: 2.4.1
>>>
>>> I did not run hbck.
>>>
>>> Thanks
>>> ________________________________
>>> 보낸 사람: Ted Yu <yuzhihong@gmail.com>
>>> 보낸 날짜: 2017년 2월 7일 화요일 오전 10:40:28
>>> 받는 사람: user@hbase.apache.org
>>> 제목: Re: Region server dies at regular intervals for unknown reasons.
>>>
>>> Kang:
>>> Please let us know the release of hbase and hadoop you use.
>>>
>>> Did you run hbck around the time region server crashed ?
>>>
>>> If there was inconsistency, please pastebin as well.
>>>
>>> Thanks
>>>
>>> On Mon, Feb 6, 2017 at 5:36 PM, Ganesh Viswanathan <gansvv@gmail.com>
>>> wrote:
>>>
>>>> Check the GC logs for HBase and HDFS. Tail and post HBase logs from
>>>> regionserver and from active HBase master to help debug the root cause.
>>>>
>>>>
>>>>
>>>> On Mon, Feb 6, 2017 at 5:27 PM Kang Minwoo <minwoo.kang@outlook.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>> My region servers die at regular intervals for unknown reasons.
>>>>> I restarted HBase and region servers continued to die.
>>>>>
>>>>> I solved it by eliminating Old WAL.
>>>>>
>>>>> Now I'm going through the logs and trying to find the cause.
>>>>> But I do not know where to look.
>>>>>
>>>>> Please let me know if I need to watch carefully to find out why the
>>>> Region
>>>>> server is dying.
>>>>> I think it will be very helpful.
>>>>>
>>>>> Not all region servers were killed, and only some region servers
>> died.
>>>>>
>>>>> Thanks.
>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message