hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brahma Reddy Battula (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10226) Track and use BlockScheduled size for DatanodeDescriptor instead of count.
Date Tue, 29 Mar 2016 06:20:25 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brahma Reddy Battula updated HDFS-10226:
----------------------------------------
    Description: 
Tracking count will result in inaccurate estimation of remaining space in case of different
block sized files being written.

This issue can happen when parallel write happen with different block size.

 *For Example:*  
Datanode Capacity is 10GB, available is 2GB
ClientA wants to write 2 blocks with block size 1GB 
ClientB wants to write 2 blocks with block size 128MB

Here ClientB thinks scheduled size 128MB *2 = 256MB and write success where clientA write
will fail.

  was:
Tracking count will result in inaccurate estimation of remaining space in case of different
block sized files being written.

 *For Example:*  
1. Datanode Capacity is 10GB, available is 2GB.
2. For NNBench testing, Low blocksize might be used ( such as 1MB), and currently 20 blocks
are being written to DN. Scheduled counter will be 20.
3. This counter will not give any issue for blocks of NNBench with block size as 1MB.

but for normal files with 128MB block size, remaining space will be seen as 0. (because it
will calculate based on current file's block size. not the original scheduled size. 
20*128MB =  2.5GB, which is greater than available. So remaining will be 0 for normal block.

here we'll get, 
"Could only be replicated to 0 nodes instead of minReplication (=1 ). There are 2 datanode(s)
running and no node(s) are excluded in this operation" exception will come.


> Track and use BlockScheduled size for DatanodeDescriptor instead of count.
> --------------------------------------------------------------------------
>
>                 Key: HDFS-10226
>                 URL: https://issues.apache.org/jira/browse/HDFS-10226
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>
> Tracking count will result in inaccurate estimation of remaining space in case of different
block sized files being written.
> This issue can happen when parallel write happen with different block size.
>  *For Example:*  
> Datanode Capacity is 10GB, available is 2GB
> ClientA wants to write 2 blocks with block size 1GB 
> ClientB wants to write 2 blocks with block size 128MB
> Here ClientB thinks scheduled size 128MB *2 = 256MB and write success where clientA write
will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message