hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhihua Deng (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-16212) Many connections to datanode are created when doing a large scan
Date Tue, 19 Jul 2016 06:25:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383565#comment-15383565
] 

Zhihua Deng edited comment on HBASE-16212 at 7/19/16 6:24 AM:
--------------------------------------------------------------

Changing from threadlocal to synchronization, yes there will be a potential synchronization
bottleneck, but it better than io operation.  So the question here is that how often the connection
will be recreated for seeking + reading? The original threadlocal is declared as non static
private field here,  it means that the created fsreaderimpl instance will be reused later
on, also an inputstream is initiated when fsreaderimpl created. 
Taken the case described in the attached log, The synchronization way is more better than
threadlocal when acts as a sequential read .
How about concurrent case?  the worst case: Thread1.readBlockInternal -> Thread2.readBlockInternal
-> Thread3.readBlockInternal -> Thread1.readBlockInternal -> ....
In this case, the synchronization way is equal to threadlocal when taken how many connections
will be created into consideration.





was (Author: dengzh):
Changing from threadlocal to a common field, yes there will be a potential synchronization
bottleneck, but it better than io operation.  So the question here is that how often the connection
will be recreated for seeking + reading? The original threadlocal is declared as non static
private field here,  it means that the created fsreaderimpl instance will be reused later
on, also a inputstream is initiated when fsreaderimpl created. 
Taken the case described in the attached log, The synchronization way is more better than
threadlocal when acts as a sequential read .
How about concurrent case?  the worst case: Thread1.readBlockInternal -> Thread2.readBlockInternal
-> Thread3.readBlockInternal -> Thread1.readBlockInternal -> ....
In this case, the synchronization way is equal to threadlocal when taken how many connections
will be created into consideration.




> Many connections to datanode are created when doing a large scan 
> -----------------------------------------------------------------
>
>                 Key: HBASE-16212
>                 URL: https://issues.apache.org/jira/browse/HBASE-16212
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.1.2
>            Reporter: Zhihua Deng
>         Attachments: HBASE-16212.patch, HBASE-16212.v2.patch, regionserver-dfsinputstream.log
>
>
> As described in https://issues.apache.org/jira/browse/HDFS-8659, the datanode is suffering
from logging the same repeatedly. Adding log to DFSInputStream, it outputs as follows:
> 2016-07-10 21:31:42,147 INFO  [B.defaultRpcServer.handler=22,queue=1,port=16020] hdfs.DFSClient:
DFSClient_NONMAPREDUCE_1984924661_1 seek DatanodeInfoWithStorage[10.130.1.29:50010,DS-086bc494-d862-470c-86e8-9cb7929985c6,DISK]
for BP-360285305-10.130.1.11-1444619256876:blk_1109360829_35627143. pos: 111506876, targetPos:
111506843
>  ...
> As the pos of this input stream is larger than targetPos(the pos trying to seek), A new
connection to the datanode will be created, the older one will be closed as a consequence.
When the wrong seeking ops are large, the datanode's block scanner info message is spamming
logs, as well as many connections to the same datanode will be created.
> hadoop version: 2.7.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message