Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates
 74.125.82.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOTGtFyXBWWyD__qW5_9Hei7usD5n84hZ3sobreuCJHg8xq1nw@mail.gmail.com>
References: 
 <CAOTGtFwvheLoMLeHrNs2XDEMCggzniCd798FZ_eEEXbRb4R8Nw@mail.gmail.com>
	<CAMwMvV7F7dMOKvgPGgA-Z=F2WgJ=+U5h_YRRz+Q4bWCY6=ubeg@mail.gmail.com>
	<CADcMMgEtXhdy9X34sWTd3nVMV7qJAALxOWW0-eQHwtQ69C6Ebw@mail.gmail.com>
	<CAFNsrOfZhri4PSsE3mPOdiWZN=Kxqxjb_WC4Ak7+OvHJK3SB1g@mail.gmail.com>
	<CADcMMgGDspoJiNcD6G_x44cqQfQs0KXyX2DSYpvZmujDa6OmXA@mail.gmail.com>
	<CAB5sDNKm7aspvikzwi0ujhn9iUTaafvHRn6p0h5h4SWQnCTYPQ@mail.gmail.com>
	<CAFNsrOcE4yUxg20VsgQgtUOsFToL4ao0UYaZK6xLYHSgk2_+nQ@mail.gmail.com>
	<CAOTGtFz6Lk27TAVEQeDBgzuXgjM=KqVYo-xoqesKtrdWyR-s0Q@mail.gmail.com>
	<CAOTGtFyXBWWyD__qW5_9Hei7usD5n84hZ3sobreuCJHg8xq1nw@mail.gmail.com>
Date: Sun, 2 Oct 2011 07:45:04 -0700
Message-ID: 
 <CALte62xBFLPj1sBpEU5rZJGz0Rj5N+zxYa+XTt+L5Y3TTrAQNQ@mail.gmail.com>
Subject: Re: Multiple WALs
From: Ted Yu <yuzhihong@gmail.com>
To: dev@hbase.apache.org
Content-Type: multipart/alternative; boundary=0016364d1e6563290104ae51e776

--0016364d1e6563290104ae51e776
Content-Type: text/plain; charset=ISO-8859-1

Take it easy. You were the reporter: you started the discussion thread and
logged JIRA.

In the future, please keep as much detail from the discussion as possible in
the JIRA.

Cheers

On Sat, Oct 1, 2011 at 10:22 PM, Akash Ashok <thehellmaker@gmail.com> wrote:

> Uh Oh! Its showing the reporter of the problem as me but it wasn't me in
> reality :) I am not able to modify it. Please feel free to change it :)
>
> Cheers,
> Akash A
>
> On Sun, Oct 2, 2011 at 10:38 AM, Akash Ashok <thehellmaker@gmail.com>
> wrote:
>
> > I've opened up a JIRA for this
> > https://issues.apache.org/jira/browse/HBASE-4529
> >
> > Cheers,
> > Akash A
> >
> >
> > On Sun, Oct 2, 2011 at 6:04 AM, karthik tunga <karthik.tunga@gmail.com
> >wrote:
> >
> >> Hey Stack,
> >>
> >> Along with the log replaying part, logic is also needed for log roll
> over.
> >> This, I think, easier compared to the merging of the logs. Any edits
> less
> >> than the last sequence number  on the file system can be removed from
> all
> >> the WALs.
> >>
> >> Cheers,
> >> Karthik
> >>
> >> On 1 October 2011 18:05, Jesse Yates <jesse.k.yates@gmail.com> wrote:
> >>
> >> > I think adding the abstraction layer and making it not only pluggable,
> >> but
> >> > configurable would be great.
> >> >
> >> >  It would be nice to be able to tie into a service that logs directly
> to
> >> > disk, rather than go through HDFS giving some potentially awesome
> >> speedup
> >> > at
> >> > the cost of having to write a logging service that handles
> replication,
> >> > etc.
> >> > Side note, Accumulo is using their own service to storing the WAL,
> >> rather
> >> > than HDFS and I suspect that plays a big role in people's claim of its
> >> > ability to do 'outperform' HBase.
> >> >
> >> > -Jesse Yates
> >> >
> >> > On Sat, Oct 1, 2011 at 2:04 PM, Stack <stack@duboce.net> wrote:
> >> >
> >> > > Yes.  For sure.  Would need to check that the split can deal w/
> >> > > multiple logs written by the one server concurrently (sort by
> sequence
> >> > > edit id after sorting on all the rest that makes up a wal log key).
> >> > >
> >> > > St.Ack
> >> > >
> >> > > On Sat, Oct 1, 2011 at 1:36 PM, karthik tunga <
> >> karthik.tunga@gmail.com>
> >> > > wrote:
> >> > > > Hey,
> >> > > >
> >> > > > Doesn't multiple WALs need some kind of merging when recovering
> from
> >> a
> >> > > crash
> >> > > > ?
> >> > > >
> >> > > > Cheers,
> >> > > > Karthik
> >> > > >
> >> > > >
> >> > > > On 1 October 2011 15:17, Stack <stack@duboce.net> wrote:
> >> > > >
> >> > > >> +1 on making WAL pluggable so we can experiment.  Being able to
> >> write
> >> > > >> multiple WALs at once should be easy enough to do (the WAL split
> >> code
> >> > > >> should be able to handle it). Also a suggestion made a while back
> >> was
> >> > > >> making it so hbase could be configured to write two filesystems
> --
> >> > > >> there'd be hbase.rootdir as now -- and then we'd allow specifying
> >> > > >> another fs to use for writing WALs (If not specified, we'd just
> use
> >> > > >> hbase.rootdir for all filesystem interactions as now).
> >> > > >>
> >> > > >> St.Ack
> >> > > >>
> >> > > >> On Sat, Oct 1, 2011 at 10:56 AM, Dhruba Borthakur <
> >> dhruba@gmail.com>
> >> > > >> wrote:
> >> > > >> > I have been experimenting with the WAL settings too. It is
> >> obvious
> >> > > that
> >> > > >> > turning off the wal makes ur transactions go faster, HDFS
> >> write/sync
> >> > > are
> >> > > >> not
> >> > > >> > yet very optimized for high throughput small writes.
> >> > > >> >
> >> > > >> > However, irrespective of whether I have one wal or two, I have
> >> > seeing
> >> > > the
> >> > > >> > same throughput. I have experimented with an HDFS setting that
> >> > allows
> >> > > >> > writing/sync to multiple replicas in parallel, and that has
> >> > increased
> >> > > >> > performance for my test workload, see
> >> > > >> > https://issues.apache.org/jira/browse/HDFS-1783.
> >> > > >> >
> >> > > >> > About using one wal or two, it will be nice if we can separate
> >> out
> >> > the
> >> > > >> wal
> >> > > >> > API elegantly and make it pluggable. In that case, we can
> >> experiment
> >> > > >> HBase
> >> > > >> > with multiple systems. Once we have it pluggable, we can make
> the
> >> > > habse
> >> > > >> wal
> >> > > >> > go to a separate HDFS (pure SSD based maybe?).
> >> > > >> >
> >> > > >> > -dhruba
> >> > > >> >
> >> > > >> >
> >> > > >> > On Sat, Oct 1, 2011 at 8:09 AM, Akash Ashok <
> >> thehellmaker@gmail.com
> >> > >
> >> > > >> wrote:
> >> > > >> >
> >> > > >> >> Hey,
> >> > > >> >> I've see that setting writeToWAL(false) boosts up the writes
> >> like
> >> > > crazy.
> >> > > >> I
> >> > > >> >> was just thinking having MuiltipleWAL on HBase. I understand
> >> that
> >> > > this
> >> > > >> is a
> >> > > >> >> consideration in BigTable paper that a WAL per region is not
> >> used
> >> > > >> because
> >> > > >> >> it
> >> > > >> >> might result in a lot of disk seeks when there are large
> number
> >> of
> >> > > >> reasons.
> >> > > >> >> But how about having as many WALs as the number of HardDrives
> in
> >> > the
> >> > > >> >> system.
> >> > > >> >> I see that the recommended configs for HBase are 4 - 12 hard
> >> drives
> >> > > per
> >> > > >> >> node. This might kick the writes up a notch.
> >> > > >> >>
> >> > > >> >> Would like to know the general opinion on this one?
> >> > > >> >>
> >> > > >> >> Cheers,
> >> > > >> >> Akash A
> >> > > >> >>
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> > --
> >> > > >> > Connect to me at http://www.facebook.com/dhruba
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

--0016364d1e6563290104ae51e776--