Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45A22179EA for ; Tue, 7 Apr 2015 18:07:58 +0000 (UTC) Received: (qmail 79697 invoked by uid 500); 7 Apr 2015 18:07:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 79633 invoked by uid 500); 7 Apr 2015 18:07:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 79621 invoked by uid 99); 7 Apr 2015 18:07:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 18:07:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of esteban@cloudera.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-la0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 18:07:51 +0000 Received: by laat2 with SMTP id t2so41323148laa.1 for ; Tue, 07 Apr 2015 11:06:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=ha2feYB+LltDJgwURLyr2hQGbsBRT3CWqA70affqqtg=; b=kKPc9q0CyXQkoEolpuwr7xzXP3ayItvTfY0Rn2de9ku5pAvXLena3bb44ZOFJXmioT xu84cHFYKmg8VKLq1C9N+svDMZDbR7xaHyiubWWml84EOxBvN4+Kzc1zWMwD93Jk8f8J 4B8OfPJlWJkSp8CZF4gGdbZ7JabQLxbqBhuIASMhY+eAnpzQbdoViLytQacBDHFAmGzB XruSFS7MIeIiVKAB95QYAKenkzwzGAnxiK9mvwDrXdJj7BgJnadOp2rJhZcOkygK9U3G EQXQXqrco/cfi5z0tMemYEH+R5zDG30PUy3e/Eg6QANQcrnm8OBIRR10aLCTofJ0vdCG fFkw== X-Gm-Message-State: ALoCoQmgtpExaP0ng1fv/jQNK7O0mnCZ0R8DOSfGabknVRTCLZbXE9tbvNHhHpyM81wD7DeWMCEu X-Received: by 10.153.5.8 with SMTP id ci8mr15071569lad.62.1428430005237; Tue, 07 Apr 2015 11:06:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.25.74 with HTTP; Tue, 7 Apr 2015 11:06:29 -0700 (PDT) In-Reply-To: <552415A801D007C4003906F6_0_80783@p057> References: <552415A801D007C4003906F6_0_80783@p057> From: Esteban Gutierrez Date: Tue, 7 Apr 2015 11:06:29 -0700 Message-ID: Subject: Re: write availability To: Marcelo Valle Cc: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a1133a9520c9a5c051326473b X-Virus-Checked: Checked by ClamAV on apache.org --001a1133a9520c9a5c051326473b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable -- Cloudera, Inc. On Tue, Apr 7, 2015 at 10:36 AM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemilita@bloomberg.net> wrote: > Sorry, there is something I asked wrongly because I was understanding it > wrongly. > 1 region server correspond to 1 namenode and 1 write to 1 name node will > replicate to 3 datanodes... > Not really, but I think we understood the failure mode you were curious to know more about :) > > So to simplify the second question, what happens to the HBase cluster whe= n > 1 region server is down? > The simple case is something like this: The HBase Master will get a notification from ZooKeeper that the znode for this RS has expired and will start the recovery process which will look up into the existing WALs on HDFS for this RS and will start the distributed log splitting of this WALs across the cluster. Once replaying the edits (writes) found in the WALs completes, the HBase Master will open the region on other RSs and reads and writes will be available for the clients immediately. With read replicas enabled, only writes will not be available until the log replay completes and that can features like the distributed log replay (HBASE-7006) can help to speed up the process. HBase provides other features like replication which can even help you further on HA and other disaster recovery scenarios= . if you have more questions pelase let us know! esteban. > > -Marcelo > > > From: Marcelo Valle (BLOOMBERG/ LONDON) > Subject: Re: write availability > > Esteban, > > If I understood correctly what you said: > > > "For the failure mode you mention if all DNs go down (not the NN) > clients will be blocked waiting for the acknowledge of a write to the DNs > and after few retries the RS will consider there was a failure writing to > the WAL, the RS will attempt to roll the WAL for a last time and if fails > at this point the RS will consider this as a fatal condition and it will > shutdown it self. At this point the client probably ran out of retries an= d > will throw an exception to the application." > > If this scenario happens, when will my application be available to accept > writes for that region again? When I do some manual intervention on the > server? > > For example: support I split data by user ids, so each user is stored in = a > different region. In the scenario above, my application (and also the HBa= se > cluster) would be working for some users and wouldn't be working for user= s > whose user id is in a "down region" (a region where all corresponding DNs > are down, considering 1 DN per RS). Is this right? > > -Marcelo. > > From: esteban@cloudera.com > Subject: Re: write availability > > > Hello Marcelo, > > HBase has strong durability guarantees to avoid data loss. When a write > arrives to a RegionServer data will be persisted into a Write-Ahead-Log (= on > HDFS) and temporarily in the RegionServer memory until the data from this > memory store is flushed (also to HDFS). > > For the point of view of a client that is writing to HBase, if it > receives a response for a successful write operation (put, delete, appen= d, > increment) then we can guarantee that data was correctly persisted to HDF= S > in the WAL and in case of a catastrophic failure of a RegionServer we wil= l > be able to recover as others have mentioned. > > For the failure mode you mention if all DNs go down (not the NN) clients > will be blocked waiting for the acknowledge of a write to the DNs and aft= er > few retries the RS will consider there was a failure writing to the WAL, > the RS will attempt to roll the WAL for a last time and if fails at this > point the RS will consider this as a fatal condition and it will shutdown > it self. At this point the client probably ran out of retries and will > throw an exception to the application. > > If a single DN can recover before any of the RSs goes down, the writes > will recover and the client will get the acknowledge that data has been > persisted to HDFS (even with a single DN at this point), during this peri= od > the RS logs will warn that data is getting persisted with a lower number = of > replicas and data could be at risk. > > If you are further interested in the write path in HBase there is a reall= y > good blog post from Jimmy Xiang about this topic: > http://blog.cloudera.com/blog/2012/06/hbase-write-path > > best, > esteban. > > > -- > Cloudera, Inc. > > > On Tue, Apr 7, 2015 at 9:04 AM, Marcelo Valle (BLOOMBERG/ LONDON) < > mvallemilita@bloomberg.net> wrote: > >> Wellington, >> >> I might be misinterpreting this: >> http://stackoverflow.com/questions/13741946/role-of-datanode-regionserve= r-in-hbase-hadoop-integration >> >> But aren't HBase region servers and HDFS datanodes always in the same >> server? With a replication factor of 3, what happens if all 3 datanodes >> hosting that information go down and one of them come back, but with the >> disk intact? Considering from the time they went down to the time it wen= t >> back HBase received new writes that would go to the same data node... >> >> >> From: user@hbase.apache.org >> Subject: Re: write availability >> >> The data is stored on files on hdfs. If a RS goes down, the master knows >> which regions were on that RS and which hdfs files contain data for thes= e >> regions, so it will just assign the regions to others RS, and these othe= rs >> RS will have access to the regions data because it's stored on HDFS. The= RS >> does not "own" the disk, this is HDFS job, so the recovery on this case = is >> transparent. >> >> >> On 7 Apr 2015, at 16:51, Marcelo Valle (BLOOMBERG/ LONDON) < >> mvallemilita@bloomberg.net> wrote: >> >> > So if a RS goes down, it's assumed you lost the data on it, right? >> > HBase has replications on HDFS, so if a RS goes down it doesn't mean I >> lost all the data, as I could have the replicas yet... But what happens = if >> all RS hosting a specific region goes down? >> > What if one RS from this one comes back again, but with the disk >> intact, with all the data it had before crashing? >> > >> > >> > From: user@hbase.apache.org >> > Subject: Re: write availability >> > >> > When a RS goes down, the Master will try to assign the regions on the >> remaining RSes. When the RS comes back, after a while, the Master balanc= er >> process will re-distribute regions between RS, so the given RS will be >> hosting regions, but not necessarily the one it used to host before it w= ent >> down. >> > >> > >> > On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) < >> mvallemilita@bloomberg.net> wrote: >> > >> >>> So if the cluster is up, then you can insert records in to HBase eve= n >> though you lost a RS that was handing a specific region. >> >> >> >> What happens when the RS goes down? Writes to that region will be >> written to another region server? Another RS assumes the region "range" >> while the RS is down? >> >> >> >> What happens when the RS that was down goes up again? >> >> >> >> >> >> From: user@hbase.apache.org >> >> Subject: Re: write availability >> >> >> >> I don=E2=80=99t know if I would say that=E2=80=A6 >> >> >> >> I read Marcelo=E2=80=99s question of =E2=80=9Cif the cluster is up, e= ven though a RS >> may be down, can I still insert records in to HBase?=E2=80=9D >> >> >> >> So if the cluster is up, then you can insert records in to HBase even >> though you lost a RS that was handing a specific region. >> >> >> >> But because he talked about syncing nodes=E2=80=A6 I could be misread= ing his >> initial question=E2=80=A6 >> >> >> >>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak >> wrote: >> >>> >> >>>> If I have an application that writes to a HBase cluster, can I coun= t >> that >> >>> the cluster will always available to receive writes? >> >>> No, it's CP, not AP system. >> >>>> so everything get in sync when the other nodes get up again >> >>> There is no hinted backoff, It's not Cassandra. >> >>> >> >>> >> >>> >> >>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) < >> >>> mvallemilita@bloomberg.net>: >> >>> >> >>>> If I have an application that writes to a HBase cluster, can I coun= t >> that >> >>>> the cluster will always available to receive writes? >> >>>> I might not be able to read if a region server which handles a rang= e >> of >> >>>> keys is down, but will I be able to keep writing to other nodes, so >> >>>> everything get in sync when the other nodes get up again? >> >>>> Or I might get no write availability for a while? >> >> >> >> The opinions expressed here are mine, while they may reflect a >> cognitive thought, that is purely accidental. >> >> Use at your own risk. >> >> Michael Segel >> >> michael_segel (AT) hotmail.com >> > >> > >> >> >> > > > --001a1133a9520c9a5c051326473b--