Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 15 Jul 2014 09:49:05 +0000 (UTC)
From: "Vinayakumar B (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12702096.1395133499093.43769.1405417745640@arcas>
In-Reply-To: <JIRA.12702096.1395133499093@arcas>
References: <JIRA.12702096.1395133499093@arcas>
Subject: [jira] [Commented] (HDFS-6114) Block Scan log rolling will never
 happen if blocks written continuously leading to huge size of
 dncp_block_verification.log.curr
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061895#comment-14061895 ] 

Vinayakumar B commented on HDFS-6114:
-------------------------------------

bq. I don't really see a good reason to separate delBlockInfo and delNewBlockInfo. It seems like this could just lead to scenarios where we think we're deleting a block but it pops back up (because we deleted, but did not delete new)
Here, both are working on different set. {{delBlockInfo}} is used in someother places as well while updating the scantime and resort the blockInfoSet.
{{delNewBlockInfo}} is only needs to be called while deleting the block itself, as intermediate updates will not happen on this set data.
So {{delBlockInfo}} and {{delNewBlockInfo}} serves separate purposes and both are required.

bq. I guess maybe it makes sense to separate addBlockInfo from addNewBlockInfo, just because there are places in the setup code where we're willing to add stuff directly to blockInfoSet. Even in that case, I would argue it might be easier to call addNewBlockInfo and then later roll all the newBlockInfoSet items into blockInfoSet. The problem is that having both functions creates confusion and increase the chance that someone will add an incorrect call to the wrong one later on in another change.
As I am seeing, both these methods are private and acts on different sets. since method name itself suggests {{addNewBlockInfo}} is only for the new blocks. I am not seeing any confusion here.

bq. It seems like a bad idea to use BlockScanInfo.LAST_SCAN_TIME_COMPARATOR for blockInfoSet, but BlockScanInfo#hashCode (i.e. the HashSet strategy) for newBlockInfoSet. Let's just use a SortedSet for both so we don't have to ponder any possible discrepancies between the comparator and the hash function.
{{blockInfoSet}} is required to be sorted based on the lastScanTime, as oldest scanned block will be picked for scanning, which will be the first element in this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because {{BlockScanInfo#hashCode()}} is default which will sort based on the blockId rather than scan time. 
Do you suggest me to update this {{hashCode()}} itself?

bq. Another problem with HashSet (compared with TreeSet) is that it never shrinks down after enlarging... a bad property for a temporary holding area
Yes, this I agree, will update in the next patch.

> Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6114
>                 URL: https://issues.apache.org/jira/browse/HDFS-6114
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.3.0, 2.4.0
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-6114.patch, HDFS-6114.patch
>
>
> 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are scanned. 
> 2. If the blocks (with size in several MBs) to datanode are written continuously 
> then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously scanning the blocks
> 3. These blocks will be deleted after some time (enough to get block scanned)
> 4. As Block Scanning is throttled, So verification of all blocks will take so much time.
> 5. Rolling will never happen, so even though the total number of blocks in datanode doesn't increases, entries ( which contains stale entries of deleted blocks) in *dncp_block_verification.log.curr* continuously increases leading to huge size.
> In one of our env, it grown more than 1TB where total number of blocks were only ~45k.


--
This message was sent by Atlassian JIRA
(v6.2#6252)