Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED7C8880A for ; Tue, 13 Sep 2011 11:21:32 +0000 (UTC) Received: (qmail 31067 invoked by uid 500); 13 Sep 2011 11:21:31 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 29598 invoked by uid 500); 13 Sep 2011 11:21:20 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 29587 invoked by uid 99); 13 Sep 2011 11:21:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 11:21:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.51] (HELO mail-vw0-f51.google.com) (209.85.212.51) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 11:21:09 +0000 Received: by vws20 with SMTP id 20so715194vws.38 for ; Tue, 13 Sep 2011 04:20:48 -0700 (PDT) Received: by 10.52.112.130 with SMTP id iq2mr2277284vdb.137.1315912848263; Tue, 13 Sep 2011 04:20:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.113.138 with HTTP; Tue, 13 Sep 2011 04:20:28 -0700 (PDT) X-Originating-IP: [98.122.162.167] In-Reply-To: References: From: Ted Dunning Date: Tue, 13 Sep 2011 11:20:28 +0000 Message-ID: Subject: Re: Regarding design of HDFS To: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec548a501e04aaa04acd0d5ea X-Virus-Checked: Checked by ClamAV on apache.org --bcaec548a501e04aaa04acd0d5ea Content-Type: text/plain; charset=ISO-8859-1 2011/9/13 kang hua > Hi Master: > can you explain more detail --- "The only way to avoid this is to > make the data much more cacheable and to have a viable cache coherency > strategy. Cache coherency at the meta-data level is difficult. Cache > coherency at the block level is also difficult (but not as difficult) > because many blocks get moved for balance purposes" > why "Cache coherency at the meta-data level is difficult" ? > I said this because meta-data is updated often. Caching in the presence of high updates requires some sort of coherency model. For meta-data, it is difficult to detect stale information on use and use of stale information can be disastrous. Thus, caching is difficult. > why "Cache coherency at the block level is also difficult (but not as > difficult) because many blocks get moved for balance purposes" > The basic problem here is update rate. Late detection of stale information is much easier however since you can just note that the block isn't where you thought it was and update your cache. There are still problems and the fact that race conditions are still being found in the HDFS lease management code is an indicator that this isn't a completely trivial problem. --bcaec548a501e04aaa04acd0d5ea Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

2011/9/13 kang hua <= ;kanghua151@msn.com>
Hi Master:
=A0=A0 =A0can you explain more detail =A0--- =A0"= The only way to avoid this is to make the data much more cacheable and to h= ave a viable cache coherency strategy. =A0Cache coherency at the meta-data = level is difficult. =A0Cache coherency at the block level is also difficult= (but not as difficult) because many blocks get moved for balance purposes&= quot;
=A0=A0=A0=A0why "Cache coherency at the meta-data level is difficult&q= uot; ?=A0

I said this bec= ause meta-data is updated often. =A0Caching in the presence of high updates= requires some sort of coherency model. =A0For meta-data, it is difficult t= o detect stale information on use and use of stale information can be disas= trous. =A0Thus, caching is difficult.
=A0
= =A0=A0 =A0why "Cache coherency at the block level is also difficult (b= ut not as difficult) because many blocks get moved for balance purposes&quo= t;

The basic problem here is upda= te rate. =A0Late detection of stale information is much easier however sinc= e you can just note that the block isn't where you thought it was and u= pdate your cache. =A0There are still problems and the fact that race condit= ions are still being found in the HDFS lease management code is an indicato= r that this isn't a completely trivial problem.

--bcaec548a501e04aaa04acd0d5ea--