hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhihua Deng (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-16212) Many connections to datanode are created when doing a large scan
Date Tue, 19 Jul 2016 04:30:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383565#comment-15383565
] 

Zhihua Deng edited comment on HBASE-16212 at 7/19/16 4:30 AM:
--------------------------------------------------------------

Changing from threadlocal to a common field, yes there will be a potential synchronization
bottleneck, but it better than io operation.  So the question here is that how often the connection
will be recreated for seeking + reading? The original threadlocal is declared as non static
private field here,  it means that the created fsreaderimpl instance will be reused later
on, also a inputstream is initiated when fsreaderimpl created. 
Taken the case described in the attached log, The synchronization way is more better than
threadlocal when acts as a sequential read .
How about concurrent case?  the worst case: Thread1.readBlockInternal -> Thread2.readBlockInternal
-> Thread3.readBlockInternal -> Thread1.readBlockInternal -> ....
In this case, the synchronization way is equal to threadlocal when taken how many connections
will be created into consideration.





was (Author: dengzh):
Changing from threadlocal to a common field, yes there will be a potential synchronization
bottleneck, but it better than io operation.  So the question here is that how often the connection
will be recreated for seeking + reading? The original threadlocal is declared as non static
private field here,  it means that the created fsreaderimpl instance will be reused later
on, also a inputstream is initiated when fsreaderimpl created. 
Taken the case described in the attached log, The synchronization way is more better than
threadlocal.
How about concurrent case?  the worst case: Thread1.readBlockInternal -> Thread2.readBlockInternal
-> Thread3.readBlockInternal -> Thread1.readBlockInternal -> ....
In this case, the synchronization way is equal to threadlocal when taken how many connections
will be created into consideration.




> Many connections to datanode are created when doing a large scan 
> -----------------------------------------------------------------
>
>                 Key: HBASE-16212
>                 URL: https://issues.apache.org/jira/browse/HBASE-16212
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.1.2
>            Reporter: Zhihua Deng
>         Attachments: HBASE-16212.patch, HBASE-16212.v2.patch, regionserver-dfsinputstream.log
>
>
> As described in https://issues.apache.org/jira/browse/HDFS-8659, the datanode is suffering
from logging the same repeatedly. Adding log to DFSInputStream, it outputs as follows:
> 2016-07-10 21:31:42,147 INFO  [B.defaultRpcServer.handler=22,queue=1,port=16020] hdfs.DFSClient:
DFSClient_NONMAPREDUCE_1984924661_1 seek DatanodeInfoWithStorage[10.130.1.29:50010,DS-086bc494-d862-470c-86e8-9cb7929985c6,DISK]
for BP-360285305-10.130.1.11-1444619256876:blk_1109360829_35627143. pos: 111506876, targetPos:
111506843
>  ...
> As the pos of this input stream is larger than targetPos(the pos trying to seek), A new
connection to the datanode will be created, the older one will be closed as a consequence.
When the wrong seeking ops are large, the datanode's block scanner info message is spamming
logs, as well as many connections to the same datanode will be created.
> hadoop version: 2.7.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message