hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr
Date Tue, 15 Jul 2014 17:03:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062322#comment-14062322
] 

Colin Patrick McCabe commented on HDFS-6114:
--------------------------------------------

bq. blockInfoSet is required to be sorted based on the lastScanTime, as oldest scanned block
will be picked for scanning, which will be the first element in this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR
is used because BlockScanInfo#hashCode() is default which will sort based on the blockId rather
than scan time.  Do you suggest me to update this hashCode() itself?

I was suggesting that you use a {{TreeSet}} or {{TreeMap}} with the same comparator as {{blockInfoSet}}.
 All the hash sets that I'm aware of do not shrink down after enlarging.

bq. So delBlockInfo and delNewBlockInfo serves separate purposes and both are required.

I can write a version of the patch that only has one del function and only one add function.
 I am really reluctant to put in another set of add/del functions on top of what's already
there, since I think it will make things hard to understand for people trying to modify this
code later or backport this patch to other branches.

> Block Scan log rolling will never happen if blocks written continuously leading to huge
size of dncp_block_verification.log.curr
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6114
>                 URL: https://issues.apache.org/jira/browse/HDFS-6114
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.3.0, 2.4.0
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-6114.patch, HDFS-6114.patch
>
>
> 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are scanned.

> 2. If the blocks (with size in several MBs) to datanode are written continuously 
> then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously scanning the
blocks
> 3. These blocks will be deleted after some time (enough to get block scanned)
> 4. As Block Scanning is throttled, So verification of all blocks will take so much time.
> 5. Rolling will never happen, so even though the total number of blocks in datanode doesn't
increases, entries ( which contains stale entries of deleted blocks) in *dncp_block_verification.log.curr*
continuously increases leading to huge size.
> In one of our env, it grown more than 1TB where total number of blocks were only ~45k.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message