hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: write availability
Date Tue, 07 Apr 2015 16:39:31 GMT
Hi Marcelo,

As you well know, HBase partitions your data set into row key ranges --
regions. Each region is assigned to a single region server, which is the
sole responsible host** for the availability of that region. When a region
is offline, for whatever reason, it is not available for interaction, reads
or writes. Not until the master has reassigned the region can it be
interacted with once more. However, a detail of our region recovery process
is that a region actually comes online for writes *before* it's available
for reads. That is, it can recover into a state that is
available-for-writes faster than it can recover into a state that is
available-for-reads. Despite this detail, the region remains the
responsibility of a single region server. There's no host that is buffering
writes on behalf of the client*** while a region is unavailable.

The role of the data nodes in storage is that of data durability and
availability -- to the region servers. The HBase client works directly with
the region servers and *not* the data nodes underneath them. There's no
requirement that region servers and data nodes be collocated. This is
merely a operational convenience that also has data locality
characteristics that we make a best-effort to exploit. In your scenario
above, as long as a single data node is available to serve HDFS blocks for
read, any region server can be assigned to host the relevant region and
make use of that data node. No colocation required. Because new writes are
creating new HDFS blocks, there's no guarantees that those new blocks will
be created on the same hosts as those which already host other blocks in
the same region****.

** The region replica work in HBASE-10070 does not change this fundamental
of a single region server being the sole responsible host for a region. The
region is opened for reads on multiple hosts, but only the "primary" is the
responsible authority for that region.

*** The client itself will buffer writes locally up to a point.

*** HDFS has the concept of block placement policy, which would allow us to
be more precise than this with block locations. Right now a read is
assigned to 3 arbitrary data nodes, though one is the local data node if
the client is collocated and another is a data node on a different rack. We
could implement a placement policy such that all copies of all files for a
given region always go to the same 3 data nodes. This could work in concert
with our Assignment Manager to assign the recovered regions onto hosts that
have a complete copy of its data. This could further hasten recovery
through locality, and when the region is recovered, it would still have
100% data locality. Of course, the caches would still be cold.

-n

On Tue, Apr 7, 2015 at 9:04 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemilita@bloomberg.net> wrote:

> Wellington,
>
> I might be misinterpreting this:
> http://stackoverflow.com/questions/13741946/role-of-datanode-regionserver-in-hbase-hadoop-integration
>
> But aren't HBase region servers and HDFS datanodes always in the same
> server? With a replication factor of 3, what happens if all 3 datanodes
> hosting that information go down and one of them come back, but with the
> disk intact? Considering from the time they went down to the time it went
> back HBase received new writes that would go to the same data node...
>
>
> From: user@hbase.apache.org
> Subject: Re: write availability
>
> The data is stored on files on hdfs. If a RS goes down, the master knows
> which regions were on that RS and which hdfs files contain data for these
> regions, so it will just assign the regions to others RS, and these others
> RS will have access to the regions data because it's stored on HDFS. The RS
> does not "own" the disk, this is HDFS job, so the recovery on this case is
> transparent.
>
>
> On 7 Apr 2015, at 16:51, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemilita@bloomberg.net> wrote:
>
> > So if a RS goes down, it's assumed you lost the data on it, right?
> > HBase has replications on HDFS, so if a RS goes down it doesn't mean I
> lost all the data, as I could have the replicas yet... But what happens if
> all RS hosting a specific region goes down?
> > What if one RS from this one comes back again, but with the disk intact,
> with all the data it had before crashing?
> >
> >
> > From: user@hbase.apache.org
> > Subject: Re: write availability
> >
> > When a RS goes down, the Master will try to assign the regions on the
> remaining RSes. When the RS comes back, after a while, the Master balancer
> process will re-distribute regions between RS, so the given RS will be
> hosting regions, but not necessarily the one it used to host before it went
> down.
> >
> >
> > On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemilita@bloomberg.net> wrote:
> >
> >>> So if the cluster is up, then you can insert records in to HBase even
> though you lost a RS that was handing a specific region.
> >>
> >> What happens when the RS goes down? Writes to that region will be
> written to another region server? Another RS assumes the region "range"
> while the RS is down?
> >>
> >> What happens when the RS that was down goes up again?
> >>
> >>
> >> From: user@hbase.apache.org
> >> Subject: Re: write availability
> >>
> >> I don’t know if I would say that…
> >>
> >> I read Marcelo’s question of “if the cluster is up, even though a RS
> may be down, can I still insert records in to HBase?”
> >>
> >> So if the cluster is up, then you can insert records in to HBase even
> though you lost a RS that was handing a specific region.
> >>
> >> But because he talked about syncing nodes… I could be misreading his
> initial question…
> >>
> >>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
> >>>
> >>>> If I have an application that writes to a HBase cluster, can I count
> that
> >>> the cluster will always available to receive writes?
> >>> No, it's CP, not AP system.
> >>>> so everything get in sync when the other nodes get up again
> >>> There is no hinted backoff, It's not Cassandra.
> >>>
> >>>
> >>>
> >>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) <
> >>> mvallemilita@bloomberg.net>:
> >>>
> >>>> If I have an application that writes to a HBase cluster, can I count
> that
> >>>> the cluster will always available to receive writes?
> >>>> I might not be able to read if a region server which handles a range
> of
> >>>> keys is down, but will I be able to keep writing to other nodes, so
> >>>> everything get in sync when the other nodes get up again?
> >>>> Or I might get no write availability for a while?
> >>
> >> The opinions expressed here are mine, while they may reflect a
> cognitive thought, that is purely accidental.
> >> Use at your own risk.
> >> Michael Segel
> >> michael_segel (AT) hotmail.com
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message