hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Multiple WALs
Date Sun, 02 Oct 2011 14:45:04 GMT
Take it easy. You were the reporter: you started the discussion thread and
logged JIRA.

In the future, please keep as much detail from the discussion as possible in
the JIRA.

Cheers

On Sat, Oct 1, 2011 at 10:22 PM, Akash Ashok <thehellmaker@gmail.com> wrote:

> Uh Oh! Its showing the reporter of the problem as me but it wasn't me in
> reality :) I am not able to modify it. Please feel free to change it :)
>
> Cheers,
> Akash A
>
> On Sun, Oct 2, 2011 at 10:38 AM, Akash Ashok <thehellmaker@gmail.com>
> wrote:
>
> > I've opened up a JIRA for this
> > https://issues.apache.org/jira/browse/HBASE-4529
> >
> > Cheers,
> > Akash A
> >
> >
> > On Sun, Oct 2, 2011 at 6:04 AM, karthik tunga <karthik.tunga@gmail.com
> >wrote:
> >
> >> Hey Stack,
> >>
> >> Along with the log replaying part, logic is also needed for log roll
> over.
> >> This, I think, easier compared to the merging of the logs. Any edits
> less
> >> than the last sequence number  on the file system can be removed from
> all
> >> the WALs.
> >>
> >> Cheers,
> >> Karthik
> >>
> >> On 1 October 2011 18:05, Jesse Yates <jesse.k.yates@gmail.com> wrote:
> >>
> >> > I think adding the abstraction layer and making it not only pluggable,
> >> but
> >> > configurable would be great.
> >> >
> >> >  It would be nice to be able to tie into a service that logs directly
> to
> >> > disk, rather than go through HDFS giving some potentially awesome
> >> speedup
> >> > at
> >> > the cost of having to write a logging service that handles
> replication,
> >> > etc.
> >> > Side note, Accumulo is using their own service to storing the WAL,
> >> rather
> >> > than HDFS and I suspect that plays a big role in people's claim of its
> >> > ability to do 'outperform' HBase.
> >> >
> >> > -Jesse Yates
> >> >
> >> > On Sat, Oct 1, 2011 at 2:04 PM, Stack <stack@duboce.net> wrote:
> >> >
> >> > > Yes.  For sure.  Would need to check that the split can deal w/
> >> > > multiple logs written by the one server concurrently (sort by
> sequence
> >> > > edit id after sorting on all the rest that makes up a wal log key).
> >> > >
> >> > > St.Ack
> >> > >
> >> > > On Sat, Oct 1, 2011 at 1:36 PM, karthik tunga <
> >> karthik.tunga@gmail.com>
> >> > > wrote:
> >> > > > Hey,
> >> > > >
> >> > > > Doesn't multiple WALs need some kind of merging when recovering
> from
> >> a
> >> > > crash
> >> > > > ?
> >> > > >
> >> > > > Cheers,
> >> > > > Karthik
> >> > > >
> >> > > >
> >> > > > On 1 October 2011 15:17, Stack <stack@duboce.net> wrote:
> >> > > >
> >> > > >> +1 on making WAL pluggable so we can experiment.  Being able
to
> >> write
> >> > > >> multiple WALs at once should be easy enough to do (the WAL
split
> >> code
> >> > > >> should be able to handle it). Also a suggestion made a while
back
> >> was
> >> > > >> making it so hbase could be configured to write two filesystems
> --
> >> > > >> there'd be hbase.rootdir as now -- and then we'd allow specifying
> >> > > >> another fs to use for writing WALs (If not specified, we'd
just
> use
> >> > > >> hbase.rootdir for all filesystem interactions as now).
> >> > > >>
> >> > > >> St.Ack
> >> > > >>
> >> > > >> On Sat, Oct 1, 2011 at 10:56 AM, Dhruba Borthakur <
> >> dhruba@gmail.com>
> >> > > >> wrote:
> >> > > >> > I have been experimenting with the WAL settings too.
It is
> >> obvious
> >> > > that
> >> > > >> > turning off the wal makes ur transactions go faster,
HDFS
> >> write/sync
> >> > > are
> >> > > >> not
> >> > > >> > yet very optimized for high throughput small writes.
> >> > > >> >
> >> > > >> > However, irrespective of whether I have one wal or two,
I have
> >> > seeing
> >> > > the
> >> > > >> > same throughput. I have experimented with an HDFS setting
that
> >> > allows
> >> > > >> > writing/sync to multiple replicas in parallel, and that
has
> >> > increased
> >> > > >> > performance for my test workload, see
> >> > > >> > https://issues.apache.org/jira/browse/HDFS-1783.
> >> > > >> >
> >> > > >> > About using one wal or two, it will be nice if we can
separate
> >> out
> >> > the
> >> > > >> wal
> >> > > >> > API elegantly and make it pluggable. In that case, we
can
> >> experiment
> >> > > >> HBase
> >> > > >> > with multiple systems. Once we have it pluggable, we
can make
> the
> >> > > habse
> >> > > >> wal
> >> > > >> > go to a separate HDFS (pure SSD based maybe?).
> >> > > >> >
> >> > > >> > -dhruba
> >> > > >> >
> >> > > >> >
> >> > > >> > On Sat, Oct 1, 2011 at 8:09 AM, Akash Ashok <
> >> thehellmaker@gmail.com
> >> > >
> >> > > >> wrote:
> >> > > >> >
> >> > > >> >> Hey,
> >> > > >> >> I've see that setting writeToWAL(false) boosts up
the writes
> >> like
> >> > > crazy.
> >> > > >> I
> >> > > >> >> was just thinking having MuiltipleWAL on HBase.
I understand
> >> that
> >> > > this
> >> > > >> is a
> >> > > >> >> consideration in BigTable paper that a WAL per region
is not
> >> used
> >> > > >> because
> >> > > >> >> it
> >> > > >> >> might result in a lot of disk seeks when there are
large
> number
> >> of
> >> > > >> reasons.
> >> > > >> >> But how about having as many WALs as the number
of HardDrives
> in
> >> > the
> >> > > >> >> system.
> >> > > >> >> I see that the recommended configs for HBase are
4 - 12 hard
> >> drives
> >> > > per
> >> > > >> >> node. This might kick the writes up a notch.
> >> > > >> >>
> >> > > >> >> Would like to know the general opinion on this one?
> >> > > >> >>
> >> > > >> >> Cheers,
> >> > > >> >> Akash A
> >> > > >> >>
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> > --
> >> > > >> > Connect to me at http://www.facebook.com/dhruba
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message