hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Umesh Agashe <uaga...@cloudera.com>
Subject Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)
Date Sat, 18 Feb 2017 00:16:09 GMT
Hi,

Here is the doc that summarizes our discussion about why we think top-down
approach requiring radical code changes compared to incremental, phased
(bottom-up) approach will help us REDO of FS directory layout.


https://docs.google.com/document/d/128Q0BqJY7OvHMUpEpZWKCaBrH1qDjpxxOVkX2KM46No/edit#heading=h.iyja9q78fh2j

Thanks,
Umesh


On Fri, Feb 17, 2017 at 12:57 PM, Stack <stack@duboce.net> wrote:

> Notes from this morning's online meeting @10AM PST (please fill in any
> detail I missed):
>
> IN ATTENDANCE:
> Aman Poonia
> Umesh Agashe, Cloudera
> Stephen Tak, AMZ
> Zach York, AMZ
> Francis Liu, Yahoo!
> Ben Mau, Yahoo!
> Sean Busbey, Cloudera
> Ted Yu, HWX
> Appy (Apekshit Sharma), Cloudera
>
>
> BACKGROUND (St.Ack)
> Y! want to do millions of regions in a Cluster.
> Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend
> heavily on HDFS rename doing atomic file and directory swaps); complicates
> being able to run on another FS.
> HBase is bound to a particular physical layout in the FS.
> Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation
> with how FS ops are distributed all about the codebase had him propose a
> logical tier with a radically simplified set of requirements of underlying
> FS (block store?); atomic operations would be done by HBase rather than
> farmed out to the FS.
> Matteo not w/ us anymore but he passed on the vision to Umesh
>
> CURRENT STATE OF FS REDO PROJECT (Umesh)
> Currently it is shelved but hope to get back to it 'soon'.
> Spent a few months on FS REDO at end of last year.
> Initial approach was to abstract out three Interfaces (original sketched by
> Matteo in [1]).
> Idea was to centralize all FS use in a few well-known locations.
> Then refactor all FS usage.
> Keep all meta data about tables, files, etc., in hbase:meta
> Idea was to slowly migrate over ops, tools etc., to the new Interface.
> This was a bottom-up approach, finding FS references, and moving references
> to one place.
> Soon found too many refs all over the code.
> Found that we might not get to desired simple Interface because API had to
> carry around baggage.
> Matteo had tried this approach in [1] and started to argue this stepped
> migration would never arrive.
>
> So restarted over w/ the ideal Simple FS Interface and the implementation
> seemed to flow smoothly.
> An in-memory POC that did simple file ops was posted a while back here [2].
>
> Given the two approaches taken above, experience indicates that the
> radical, top-down approach is more likely to succeed.
>
> WHY ARE PEOPLE INTERESTED IN FS REDO?
> Francis and Ben Mau, we want to be able to do 1M regions.
> St.Ack suggested that even small installs need to be able to do more,
> smaller regions.
> Zach is interested because wants to optimize HBase over S3 (rename,
> consistency issues). Liked the idea of metadata up in hbase;meta table and
> avoiding renames, etc.
>
> WHAT SHOULD WE DO?
> We have few resources. It is a big job (We've been talking about it a good
> while now). All docs are stale missing benefit of Umesh recent
> explorations.
> Sean pointed out that before shelving, the idea was to try the PoC
> Interface against a new hbase operation other than simple file reading and
> writing (compactions?). If the PoC Interface survived in the new context,
> we'd then step back and write up a design.
> Seemed like as good a plan as any. Plan should talk about all the ways in
> which ops can go wrong.
> Thereafter, split up the work and bring over subsystems.
> It is looking like hbase3 rather than hbase2 project (though all hoped it
> could make an hbase2).
>
> TODOs
> We agreed to post these notes with pointers to current state of FS REDO
> (See below).
> Umesh and Stack to do up a one-pager on current PoC to be posted on this
> thread or up in the FS REDO issue (HBASE-14090).
> Keep up macro status on this thread.
>
> What else?
> Thanks,
> S
>
> 1. Matteo's original FS REDO suggested plan: https://docs.google.com/
> document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit#
> 2. Umesh's PoC: https://reviews.apache.org/r/55200/
> 3. HBASE-14090 is the parent issue for this project?
> 4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming
> Matteo's ideas): https://docs.google.com/document/d/
> 10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#
>
>
> On Fri, Feb 17, 2017 at 9:53 AM, Stack <stack@duboce.net> wrote:
>
> > I put up a hangout. If above link doesn't work, try this
> > https://hangouts.google.com/call/aaahkufdurgctflufw4ivhsngue and write
> > here if can't get in.
> >
> > St.Ack
> >
> > On Tue, Feb 14, 2017 at 12:36 PM, Stack <stack@duboce.net> wrote:
> >
> >> A few folks want to have a quick chat about the state of the proposed FS
> >> redo project. The proposal is for 10AM, this Friday morning, PST. All
> >> interested parties are invited to join (shout if 10AM PST is untenable
> and
> >> suggest an alternative). Below is a google hangout link that comes alive
> >> friday morning [1].
> >>
> >> One of us will keep notes and post synopsis of discussion back here and
> >> in issue after the meeting is done.
> >>
> >> Suggest those who join try to do some background reading -- see
> >> HBASE-14439 -- so we are all around the same level of understanding when
> >> the meeting starts. Agenda will be a basic intros, current state of the
> >> project (with update on most recent effort), and then expectations.
> Basic.
> >>
> >> Thanks,
> >> S
> >>
> >> 1. https://plus.google.com/hangouts/_/calendar/c2FpbnQuYWNrQ
> >> GdtYWlsLmNvbQ.1oaqlr00ru20s1hqrsq1q05j3k?authuser=0
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message