hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张铎(Duo Zhang) <palomino...@gmail.com>
Subject Re: Query occasionally respond very slowly
Date Tue, 23 Jan 2018 08:53:11 GMT
There are lots of problems can lead to slow response so we need to collect
more information for locating the real problem.

And the datanode you provided is for writing, not for reading. The
connection reset is because of the dest datanode find that the block is
already there so it close the connection and the src datanode will get a
connection reset.

2018-01-23 16:37 GMT+08:00 聪聪 <175998806@qq.com>:

> Thank you very much for your reply.
> The probability of this slow query is very low, but it has a great impact
> on the business. Do I need to always dump jstack? Have you ever been in
> this situation?
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yu Li";<carp84@gmail.com>;
> 发送时间: 2018年1月23日(星期二) 下午4:23
> 收件人: "Hbase-User"<user@hbase.apache.org>;
>
> 主题: Re: Query occasionally respond very slowly
>
>
>
> 0.98.6 is really old version and doesn't include some later on improvements
> which could help locating the issue such as HBASE-16033
> <https://issues.apache.org/jira/browse/HBASE-16033> (including row message
> of the slow query so we could repeat the query in hbase shell and try
> reproducing the issue, available from 0.98.21) and HBASE-15160
> <https://issues.apache.org/jira/browse/HBASE-15160> (add metrics on HDFS
> operations so we could check whether any IO spike at the same time of the
> slow respond, available from 1.4.0), so my first suggestion is to upgrade
> your hbase version (especially branch-0.98 is already EOL, FYI), or
> manually backport these patches to your version and try.
>
> If upgrading is impossible, from the posted limited information I could
> only say the DN log seems irrelative to the issue. In my perspective the
> most effective way to locate the problem is to dump the jstack of the RS
> when slow query happening and check where it's waiting (the slow query last
> for more than 20 seconds, so if it happens frequently, there's a high
> chance to catch it).
>
> Hope these information helps, and good luck.
>
> Best Regards,
> Yu
>
> On 23 January 2018 at 15:46, 聪聪 <175998806@qq.com> wrote:
>
> > The hbase version is 0.98.6-cdh5.2.0.
> > The HDFS version is 2.5.0-cdh5.2.0.
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<175998806@qq.com>;
> > 发送时间: 2018年1月23日(星期二) 下午2:50
> > 收件人: "user"<user@hbase.apache.org>;
> >
> > 主题: Query occasionally respond very slowly
> >
> >
> >
> > Recently, query occasionally respond very slowly.These queries usually
> > return quickly, within a few milliseconds.But occasionally it gets very
> > slow, reaching more than 20 seconds.I looked at the gc log and there was
> no
> > full gc happening.
> >
> >
> > A regionserver log is as follows:
> > 2018-01-22 16:38:13,580 WARN  [B.defaultRpcServer.handler=
> 35,queue=5,port=60020]
> > ipc.RpcServer: (responseTooSlow): {"processingtimems":23513,"
> > call":"Get(org.apache.hadoop.hbase.protobuf.generated.
> > ClientProtos$GetRequest)","client":"10.94.76.216:34324","
> > starttimems":1516610270064,"queuetimems":0,"class":"
> > HRegionServer","responsesize":412,"method":"Get"}
> >
> >
> > One of the datanode logs is as follows:
> > 2018-01-22 16:37:42,417 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode.clienttrace:
> > src: /10.90.18.70:50010, dest: /10.90.18.70:54469, bytes: 12288, op:
> > HDFS_READ, cliID: DFSClient_hb_rs_l-hbase50.dba.cn2.qunar.com,60020,
> 1505725242560_-1708409423_37,
> > offset: 948224, srvID: ab75b2a1-af8b-4fcf-a93a-6245aab9241c, blockid:
> > BP-1760821987-10.90.18.66-1447407547902:blk_1121353497_47612799,
> > duration: 9866301
> > 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode:
> > Receiving BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_
> 47615051
> > src: /10.90.18.69:36293 dest: /10.90.18.70:50010
> > 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode:
> > opWriteBlock BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_
> 47615051
> > received exception org.apache.hadoop.hdfs.server.datanode.
> ReplicaAlreadyExistsException:
> > Block BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> > already exists in state FINALIZED and thus cannot be created.
> > 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode:
> > l-hbase50.dba.cn2:50010:DataXceiver error processing WRITE_BLOCK
> > operation  src: /10.90.18.69:36293 dst: /10.90.18.70:50010;
> > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> > Block BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> > already exists in state FINALIZED and thus cannot be created.
> > 2018-01-22 16:37:42,506 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode.clienttrace:
> > src: /10.90.18.70:50010, dest: /10.90.18.70:54510, bytes: 12288, op:
> > HDFS_READ, cliID: DFSClient_hb_rs_l-hbase50.dba.cn2.qunar.com,60020,
> 1505725242560_-1708409423_37,
> > offset: 34276352, srvID: ab75b2a1-af8b-4fcf-a93a-6245aab9241c, blockid:
> > BP-1760821987-10.90.18.66-1447407547902:blk_1121354564_47613866,
> > duration: 7418016
> >
> >
> >
> >
> >
> >
> >
> > Another datanode log is as follows:
> > 2018-01-22 16:37:42,497 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode:
> > DatanodeRegistration(10.90.18.69, datanodeUuid=95aafbc6-239c-
> 4661-ba37-4687ae9e663b,
> > infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-
> > 1fa1156b-bd6f-4113-8d02-3af80df935c3;nsid=470632750;c=0) Starting thread
> > to transfer BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_
> 47615051
> > to 10.90.18.70:50010
> > 2018-01-22 16:37:42,499 WARN org.apache.hadoop.hdfs.server.
> datanode.DataNode:
> > DatanodeRegistration(10.90.18.69, datanodeUuid=95aafbc6-239c-
> 4661-ba37-4687ae9e663b,
> > infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-
> > 1fa1156b-bd6f-4113-8d02-3af80df935c3;nsid=470632750;c=0):Failed to
> > transfer BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> > to 10.90.18.70:50010 got
> > java.net.SocketException: Original Exception : java.io.IOException:
> > Connection reset by peer
> >         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> >         at sun.nio.ch.FileChannelImpl.transferToDirectly(
> > FileChannelImpl.java:433)
> >         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.
> java:565)
> >         at org.apache.hadoop.net.SocketOutputStream.transferToFully(
> > SocketOutputStream.java:223)
> >         at org.apache.hadoop.hdfs.server.datanode.BlockSender.
> > sendPacket(BlockSender.java:547)
> >         at org.apache.hadoop.hdfs.server.datanode.BlockSender.
> > sendBlock(BlockSender.java:716)
> >         at org.apache.hadoop.hdfs.server.datanode.DataNode$
> > DataTransfer.run(DataNode.java:1805)
> >         at java.lang.Thread.run(Thread.java:744)
> > Caused by: java.io.IOException: Connection reset by peer
> >         ... 8 more
> > 2018-01-22 16:37:42,520 INFO org.apache.hadoop.hdfs.server.
> datanode.DataNode.clienttrace:
> > src: /10.90.18.69:50010, dest: /10.90.18.69:49343, bytes: 14848, op:
> > HDFS_READ, cliID: DFSClient_hb_rs_l-hbase49.dba.cn2.qunar.com,60020,
> 1464835349894_1899722521_37,
> > offset: 61291520, srvID: 95aafbc6-239c-4661-ba37-4687ae9e663b, blockid:
> > BP-1760821987-10.90.18.66-1447407547902:blk_1121217415_47476717,
> > duration: 5939553
> >
> >
> >
> > This question confused me.What caused the problem?How do we solve this?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message