hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16212) Many connections to datanode are created when doing a large scan
Date Mon, 18 Jul 2016 21:59:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383156#comment-15383156
] 

stack commented on HBASE-16212:
-------------------------------

Tell us more [~dengzh]? I think I get it. The thread local often held reference to a header
from another file altogether and this was making for all the logging you were seeing?

Looking at the patch you are making substantial changes removing the thread local that caches
last header read by thread and instead doing the caching on the fsreaderimpl which is better
in some ways but now we have a synchronization bottleneck for all threads to pass through.
What you thinking here? You thinking it will be rare that more than one thread will be going
against same file? Have you run with this patch?

Is this patch for branch-1.1? Does master still have same issue (has same basic form but a
bunch of refactoring has gone on in here).

This patch looks like a nice one. Thanks.





> Many connections to datanode are created when doing a large scan 
> -----------------------------------------------------------------
>
>                 Key: HBASE-16212
>                 URL: https://issues.apache.org/jira/browse/HBASE-16212
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.1.2
>            Reporter: Zhihua Deng
>         Attachments: HBASE-16212.patch, HBASE-16212.v2.patch, regionserver-dfsinputstream.log
>
>
> As described in https://issues.apache.org/jira/browse/HDFS-8659, the datanode is suffering
from logging the same repeatedly. Adding log to DFSInputStream, it outputs as follows:
> 2016-07-10 21:31:42,147 INFO  [B.defaultRpcServer.handler=22,queue=1,port=16020] hdfs.DFSClient:
DFSClient_NONMAPREDUCE_1984924661_1 seek DatanodeInfoWithStorage[10.130.1.29:50010,DS-086bc494-d862-470c-86e8-9cb7929985c6,DISK]
for BP-360285305-10.130.1.11-1444619256876:blk_1109360829_35627143. pos: 111506876, targetPos:
111506843
>  ...
> As the pos of this input stream is larger than targetPos(the pos trying to seek), A new
connection to the datanode will be created, the older one will be closed as a consequence.
When the wrong seeking ops are large, the datanode's block scanner info message is spamming
logs, as well as many connections to the same datanode will be created.
> hadoop version: 2.7.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message