Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 78291 invoked from network); 9 Jul 2010 18:56:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Jul 2010 18:56:54 -0000 Received: (qmail 71851 invoked by uid 500); 9 Jul 2010 18:56:54 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 71781 invoked by uid 500); 9 Jul 2010 18:56:53 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 71773 invoked by uid 99); 9 Jul 2010 18:56:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jul 2010 18:56:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jul 2010 18:56:51 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o69ImxYI007114 for ; Fri, 9 Jul 2010 18:48:59 GMT Message-ID: <26769722.290091278701339492.JavaMail.jira@thor> Date: Fri, 9 Jul 2010 14:48:59 -0400 (EDT) From: "Rodrigo Schmidt (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1094) Intelligent block placement policy to decrease probability of block loss In-Reply-To: <24967451.18401271056063147.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886823#action_12886823 ] Rodrigo Schmidt commented on HDFS-1094: --------------------------------------- @Joydeep: Using a term to describe the groups is a great idea. Just bear in mind that we are not dividing the cluster into exclusive groups. Each block has a limited number of blocks in which it can be replicated, but separate blocks might have different sets of nodes associated with, and the intersection between these sets might not be empty. Here is a sketch of the algorithm to make things a little more clear. This is just a simplification of the algorithm, and I am not describing several corner cases and reconfiguration scenarios. {code} Configuration parameters: - R: rack window, distance from initial rack we are allowed to place replicas - M: machine window, size of the machine window within a rack Whenever network topology changes: - Sort racks into a logical ring, based on rack name - Sort nodes within each rack into logical rings, based on node names For the first replica: - Write to the local machine, if possible, or pick up a random one For the second replica: - Let r be the rack in which the first replica was placed - Let i be the index of the machine in r that keeps the first replica - Pick random rack r2 that is within R racks from r - Pick random machine m2 in r2 that is within window [i, (i+M-1)%racksize] - Place replica in m2 For the third replica: - Given steps above, pick another random machine m3 in r2 that is within the same window used for m2 - Make sure m2 != m3 - Place replica in m3 {code} I hope this explanation helps solve the confusion. > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Rodrigo Schmidt > Attachments: prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and the other two replicas are on any two random nodes on a random remote rack. This means that if any three datanodes die together, then there is a non-trivial probability of losing at least one block in the cluster. This JIRA is to discuss if there is a better algorithm that can lower probability of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.