Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7946090E9 for ; Fri, 21 Oct 2011 06:32:59 +0000 (UTC) Received: (qmail 27676 invoked by uid 500); 21 Oct 2011 06:32:58 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 27637 invoked by uid 500); 21 Oct 2011 06:32:57 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 27622 invoked by uid 99); 21 Oct 2011 06:32:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 06:32:56 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 06:32:55 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B025330C16A for ; Fri, 21 Oct 2011 06:30:35 +0000 (UTC) Date: Fri, 21 Oct 2011 06:30:35 +0000 (UTC) From: "jiraposter@reviews.apache.org (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1187966684.1018.1319178635723.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1236529694.13830.1319078412821.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132364#comment-13132364 ] jiraposter@reviews.apache.org commented on HDFS-2476: ----------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2515/#review2738 ----------------------------------------------------------- Ship it! Looks very good. See the small nits below. Are there any non-trivial changes in this patch compared to what you're running in production? trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java This javadoc describes a linked hashset, but the implementation is just a normal hashset. Copy-paste error? trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java can hashCode and element be marked final? trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java formatting is a little off - should go on same line trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java I find this javadoc a little misleading - it implies that there's a linked list of elements separate from the hashtable chains. The removal order is in hash-bucket order rather than insertion order. "Maybe something like: removes N entries from the hashtable. The order in which entries are removed is unspecified, and may not correspond to the order in which they were inserted." - Todd On 2011-10-20 22:58:45, Tomasz Nykiel wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2515/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-10-20 22:58:45) bq. bq. bq. Review request for Hairong Kuang. bq. bq. bq. Summary bq. ------- bq. bq. This patch introduces two hash data structures for storing under-replicated, over-replicated and invalidated blocks. bq. bq. 1. LightWeightHashSet bq. 2. LightWeightLinkedSet bq. bq. Currently in all these cases we are using java.util.TreeSet which adds unnecessary overhead. bq. bq. The main bottlenecks addressed by this patch are: bq. -cluster instability times, when these queues (especially under-replicated) tend to grow quite drastically, bq. -initial cluster startup, when the queues are initialized, after leaving safemode, bq. -block reports, bq. -explicit acks for block addition and deletion bq. bq. 1. The introduced structures are CPU-optimized. bq. 2. They shrink and expand according to current capacity. bq. 3. Add/contains/delete ops are performed in O(1) time (unlike current log n for TreeSet). bq. 4. The sets are equipped with fast access methods for polling a number of elements (get+remove), which are used for handling the queues. bq. bq. bq. This addresses bug HDFS-2476. bq. https://issues.apache.org/jira/browse/HDFS-2476 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java PRE-CREATION bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightLinkedSet.java PRE-CREATION bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightHashSet.java PRE-CREATION bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightLinkedSet.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2515/diff bq. bq. bq. Testing bq. ------- bq. bq. Provided JUnit tests. bq. bq. bq. Thanks, bq. bq. Tomasz bq. bq. > More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks > ---------------------------------------------------------------------------------------- > > Key: HDFS-2476 > URL: https://issues.apache.org/jira/browse/HDFS-2476 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node > Reporter: Tomasz Nykiel > Assignee: Tomasz Nykiel > Attachments: hashStructures.patch, hashStructures.patch-2 > > > This patch introduces two hash data structures for storing under-replicated, over-replicated and invalidated blocks. > 1. LightWeightHashSet > 2. LightWeightLinkedSet > Currently in all these cases we are using java.util.TreeSet which adds unnecessary overhead. > The main bottlenecks addressed by this patch are: > -cluster instability times, when these queues (especially under-replicated) tend to grow quite drastically, > -initial cluster startup, when the queues are initialized, after leaving safemode, > -block reports, > -explicit acks for block addition and deletion > 1. The introduced structures are CPU-optimized. > 2. They shrink and expand according to current capacity. > 3. Add/contains/delete ops are performed in O(1) time (unlike current log n for TreeSet). > 4. The sets are equipped with fast access methods for polling a number of elements (get+remove), which are used for handling the queues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira