hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Region server dies at regular intervals for unknown reasons.
Date Fri, 10 Feb 2017 02:24:21 GMT
The 'Premature EOF from inputStream' log was at INFO level - it may not be
critical.

Please pastebin more of region server log when you reply.
Was there long pause prior to 2017-02-08 11:08:11,878 ?

Thanks

On Thu, Feb 9, 2017 at 5:59 PM, Kang Minwoo <minwoo.kang@outlook.com> wrote:

> Here are logs.
>
>
> [{HOST1} Datanode Log]
> 2017-02-08 11:08:10,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Exception for {BLOCK1}
> 2017-02-08 11:08:10,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder: {BLOCK1}, type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is
> interrupted.
> 2017-02-08 11:08:10,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder: {BLOCK1}, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2017-02-08 11:08:10,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> opWriteBlock {BLOCK1} received exception java.io.IOException: Premature EOF
> from inputStream
> 2017-02-08 11:08:10,146 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> {HOST1}:{PORT1}:DataXceiver error processing WRITE_BLOCK operation  src:
> /{IP1}:{PORT2} dst: /{IP1}:{PORT1}
>
> [{HOST1} RegionServer Log]
> 2017-02-08 11:08:05,023 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
> Client tried to access missing scanner
> 2017-02-08 11:08:07,617 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver.periodicFlusher requesting flush for region {table} after a
> delay of {N}
> 2017-02-08 11:08:07,618 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver.periodicFlusher requesting flush for region {table} after a
> delay of {N}
> 2017-02-08 11:08:11,831 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:...
>     ...
>     (zookeeper client environment log)
>     ...
> 2017-02-08 11:08:11,834 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection
> 2017-02-08 11:08:11,871 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server
> 2017-02-08 11:08:11,875 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established
> 2017-02-08 11:08:11,880 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete
>
> [Master Log]
> 2017-02-08 11:08:11,878 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker:
> RegionServer ephemeral node deleted, processing expiration [{HOST1}]
> 2017-02-08 11:08:12,442 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
> Splitting logs for {HOST1} before assignment
> 2017-02-08 11:08:12,444 INFO org.apache.hadoop.hbase.master.SplitLogManager:
> dead splitlog workers [{HOST1}]
> 2017-02-08 11:08:12,445 INFO org.apache.hadoop.hbase.master.SplitLogManager:
> started splitting {N} logs in [hdfs://a/b/WALs/{HOST1}]
>     ...
>     (SplitLogManager log)
>     ...
>
>
> Yes, I'll check it out.
> Thanks.
>
> ________________________________
> 보낸 사람: Ted Yu <yuzhihong@gmail.com>
> 보낸 날짜: 2017년 2월 9일 목요일 오후 10:43:58
> 받는 사람: user@hbase.apache.org
> 제목: Re: Region server dies at regular intervals for unknown reasons.
>
> Can you pastebin relevant logs from region server and master around this
> time ?
>
> Please also check hdfs health.
>
> > On Feb 9, 2017, at 3:44 AM, Kang Minwoo <minwoo.kang@outlook.com> wrote:
> >
> > The DataNode caused an java.io.IOException: Premature EOF from
> inputStream error.
> >
> > This error seems to have killed the region server.
> >
> > One second after this error
> >
> > I found Error log on the master server.
> >
> > RegionServerTracker: RegionServer ephemeral node deleted
> >
> > Thanks
> >
> >
> > ________________________________
> > 보낸 사람: Ted Yu <yuzhihong@gmail.com>
> > 보낸 날짜: 2017년 2월 8일 수요일 오후 12:21:11
> > 받는 사람: user@hbase.apache.org
> > 제목: Re: Region server dies at regular intervals for unknown reasons.
> >
> > You can search in master log for the region backward.
> > The log would tell you which region server last tried to open it.
> >
> > Pastebin the relevant snippet of region server log pertaining to the
> > attempted open of the region.
> >
> > Thanks
> >
> >> On Tue, Feb 7, 2017 at 7:17 PM, Kang Minwoo <minwoo.kang@outlook.com>
> wrote:
> >>
> >> Yes. I agree with you.
> >> But I can not upgrade right away.
> >>
> >> The problem is that region servers that have received a particular
> region
> >> continue to die.
> >> I got the name of that region.
> >>
> >> What would I do to find out if a region server dies when it receives a
> >> region?
> >>
> >> Thanks.
> >> ________________________________
> >> 보낸 사람: Ted Yu <yuzhihong@gmail.com>
> >> 보낸 날짜: 2017년 2월 7일 화요일 오전 11:38:23
> >> 받는 사람: user@hbase.apache.org
> >> 제목: Re: Region server dies at regular intervals for unknown reasons.
> >>
> >> 0.96 was so old.
> >>
> >> Please consider upgrading.
> >>
> >> You can do rolling upgrade to 0.98 / 1.x releases (e.g. 1.1.8 or 1.3.0).
> >>
> >> On Mon, Feb 6, 2017 at 6:04 PM, Kang Minwoo <minwoo.kang@outlook.com>
> >> wrote:
> >>
> >>> The version I use is very low.
> >>>
> >>> hbase: 0.96.2
> >>> hadoop: 2.4.1
> >>>
> >>> I did not run hbck.
> >>>
> >>> Thanks
> >>> ________________________________
> >>> 보낸 사람: Ted Yu <yuzhihong@gmail.com>
> >>> 보낸 날짜: 2017년 2월 7일 화요일 오전 10:40:28
> >>> 받는 사람: user@hbase.apache.org
> >>> 제목: Re: Region server dies at regular intervals for unknown reasons.
> >>>
> >>> Kang:
> >>> Please let us know the release of hbase and hadoop you use.
> >>>
> >>> Did you run hbck around the time region server crashed ?
> >>>
> >>> If there was inconsistency, please pastebin as well.
> >>>
> >>> Thanks
> >>>
> >>> On Mon, Feb 6, 2017 at 5:36 PM, Ganesh Viswanathan <gansvv@gmail.com>
> >>> wrote:
> >>>
> >>>> Check the GC logs for HBase and HDFS. Tail and post HBase logs from
> >>>> regionserver and from active HBase master to help debug the root
> cause.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Feb 6, 2017 at 5:27 PM Kang Minwoo <minwoo.kang@outlook.com>
> >>>> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>>
> >>>>> My region servers die at regular intervals for unknown reasons.
> >>>>> I restarted HBase and region servers continued to die.
> >>>>>
> >>>>> I solved it by eliminating Old WAL.
> >>>>>
> >>>>> Now I'm going through the logs and trying to find the cause.
> >>>>> But I do not know where to look.
> >>>>>
> >>>>> Please let me know if I need to watch carefully to find out why
the
> >>>> Region
> >>>>> server is dying.
> >>>>> I think it will be very helpful.
> >>>>>
> >>>>> Not all region servers were killed, and only some region servers
> >> died.
> >>>>>
> >>>>> Thanks.
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message