Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of jon@cloudera.com designates
 209.85.220.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMUu0w_sDTS95NXiVvoUCwdsd9ykf-too8DOdgkNZ9Mht0OcAA@mail.gmail.com>
References: 
 <CAAha9a24PyLVU56_Mp7k3qz-knWz8b5CwL6c9JOnMCrEBW1mfA@mail.gmail.com>
 <CAMUu0w_sDTS95NXiVvoUCwdsd9ykf-too8DOdgkNZ9Mht0OcAA@mail.gmail.com>
From: Jonathan Hsieh <jon@cloudera.com>
Date: Tue, 3 Dec 2013 14:03:14 -0800
Message-ID: 
 <CAAha9a1whKpo0p3iDUYc=boXV8e6tC_qjadOXsVXOoNNuw6cdA@mail.gmail.com>
Subject: Re: [Shadow Regions / Read Replicas ] Wal per region?
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a11c1e980c4551604eca877aa

--001a11c1e980c4551604eca877aa
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 3, 2013 at 11:42 AM, Enis S=F6ztutar <enis.soz@gmail.com> wrote=
:

> On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh <jon@cloudera.com> wrote:
>
> > > Deveraj:
> > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality (a=
nd
> > hence HDFS short
> > > circuit) of reads if you were to couple it with the favored nodes. Th=
e
> > cost is of course more WAL
> > > files... In the current situation (no WALpr) it would create quite so=
me
> > traffic cross machine, no?
> >
> > I think we all agree that wal per region isn't efficient on today's
> > spinning hard drive world where we are limited to a relatively low budg=
et
> > or seeks (though may be better in the future with SSD's).
> >
>
> WALpr makes sense in fully SSD world and if hdfs had journaling for write=
s.
> I don't think anybody
> is working on this yet.


what do you mean by journaling for writes?  do you mean where sync
operations update length at the nn on every call?


> Full SSD clusters are already in place (pinterest
> for example), so I
> think having WALpr as a pluggable implementation makes sense. HBase shoul=
d
> work with both
> WAL-per-regionserver (or multi) or WAL-per-region.
>
>
> I agree here.


> >
> > With this in mind, I actually I making the case that we would group the
> all
> > the regions from RS-A onto the same set of preferred regions servers.
>  This
> > way we only need to have one or two other RS's tailing the RS.
> >
> > So for example, if region X, Y and Z were on RS-A and its hlog, the
> shadow
> > region memstores for X, Y, and Z would be assigned to the same one or t=
wo
> > other RSs.  Ideally this would be where the HLog files replicas have
> > locality (helped by favored nodes/block affinity).  Doing this, we hold
> the
> > number of readers on the active hlogs to a constant number, do not add
> any
> > new cross machine traffic (though tailing currently has costs on the NN=
).
> >
> > One inefficiency we have is that if there is a single log per RS, we en=
d
> up
> > reading all the logs to tables that may not have the shadow feature
> > enabled.  However, with HBase multi-wals coming, one strategy is to sha=
rd
> > wals to a number on the order of the number of disks on a machine (12-2=
4
> > these days).  I think the a wal per namespaces (this could be used to
> have
> > a wal per table) of the hlog would make sense.  This way of shardind th=
e
> > hlog would reduce the amount of reading of irrelevant log entries on a
> log
> > tailing scheme. It would have the added benefit of reducing the log
> > splitting work reducing MTTR and allowing for recovery priorities if th=
e
> > primaries and shadows also go down.  (this is an generalization of the
> > separate out the META into a separate log idea).
> >
> > Jon.
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>


--=20
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

--001a11c1e980c4551604eca877aa--