hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "York, Zach" <zy...@amazon.com>
Subject Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)
Date Sat, 18 Feb 2017 00:18:57 GMT
Thanks for the updates! I will review when I have time.

On 2/17/17, 4:16 PM, "Umesh Agashe" <uagashe@cloudera.com> wrote:

    Hi,
    
    Here is the doc that summarizes our discussion about why we think top-down
    approach requiring radical code changes compared to incremental, phased
    (bottom-up) approach will help us REDO of FS directory layout.
    
    
    https://docs.google.com/document/d/128Q0BqJY7OvHMUpEpZWKCaBrH1qDjpxxOVkX2KM46No/edit#heading=h.iyja9q78fh2j
    
    Thanks,
    Umesh
    
    
    On Fri, Feb 17, 2017 at 12:57 PM, Stack <stack@duboce.net> wrote:
    
    > Notes from this morning's online meeting @10AM PST (please fill in any
    > detail I missed):
    >
    > IN ATTENDANCE:
    > Aman Poonia
    > Umesh Agashe, Cloudera
    > Stephen Tak, AMZ
    > Zach York, AMZ
    > Francis Liu, Yahoo!
    > Ben Mau, Yahoo!
    > Sean Busbey, Cloudera
    > Ted Yu, HWX
    > Appy (Apekshit Sharma), Cloudera
    >
    >
    > BACKGROUND (St.Ack)
    > Y! want to do millions of regions in a Cluster.
    > Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend
    > heavily on HDFS rename doing atomic file and directory swaps); complicates
    > being able to run on another FS.
    > HBase is bound to a particular physical layout in the FS.
    > Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation
    > with how FS ops are distributed all about the codebase had him propose a
    > logical tier with a radically simplified set of requirements of underlying
    > FS (block store?); atomic operations would be done by HBase rather than
    > farmed out to the FS.
    > Matteo not w/ us anymore but he passed on the vision to Umesh
    >
    > CURRENT STATE OF FS REDO PROJECT (Umesh)
    > Currently it is shelved but hope to get back to it 'soon'.
    > Spent a few months on FS REDO at end of last year.
    > Initial approach was to abstract out three Interfaces (original sketched by
    > Matteo in [1]).
    > Idea was to centralize all FS use in a few well-known locations.
    > Then refactor all FS usage.
    > Keep all meta data about tables, files, etc., in hbase:meta
    > Idea was to slowly migrate over ops, tools etc., to the new Interface.
    > This was a bottom-up approach, finding FS references, and moving references
    > to one place.
    > Soon found too many refs all over the code.
    > Found that we might not get to desired simple Interface because API had to
    > carry around baggage.
    > Matteo had tried this approach in [1] and started to argue this stepped
    > migration would never arrive.
    >
    > So restarted over w/ the ideal Simple FS Interface and the implementation
    > seemed to flow smoothly.
    > An in-memory POC that did simple file ops was posted a while back here [2].
    >
    > Given the two approaches taken above, experience indicates that the
    > radical, top-down approach is more likely to succeed.
    >
    > WHY ARE PEOPLE INTERESTED IN FS REDO?
    > Francis and Ben Mau, we want to be able to do 1M regions.
    > St.Ack suggested that even small installs need to be able to do more,
    > smaller regions.
    > Zach is interested because wants to optimize HBase over S3 (rename,
    > consistency issues). Liked the idea of metadata up in hbase;meta table and
    > avoiding renames, etc.
    >
    > WHAT SHOULD WE DO?
    > We have few resources. It is a big job (We've been talking about it a good
    > while now). All docs are stale missing benefit of Umesh recent
    > explorations.
    > Sean pointed out that before shelving, the idea was to try the PoC
    > Interface against a new hbase operation other than simple file reading and
    > writing (compactions?). If the PoC Interface survived in the new context,
    > we'd then step back and write up a design.
    > Seemed like as good a plan as any. Plan should talk about all the ways in
    > which ops can go wrong.
    > Thereafter, split up the work and bring over subsystems.
    > It is looking like hbase3 rather than hbase2 project (though all hoped it
    > could make an hbase2).
    >
    > TODOs
    > We agreed to post these notes with pointers to current state of FS REDO
    > (See below).
    > Umesh and Stack to do up a one-pager on current PoC to be posted on this
    > thread or up in the FS REDO issue (HBASE-14090).
    > Keep up macro status on this thread.
    >
    > What else?
    > Thanks,
    > S
    >
    > 1. Matteo's original FS REDO suggested plan: https://docs.google.com/
    > document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit#
    > 2. Umesh's PoC: https://reviews.apache.org/r/55200/
    > 3. HBASE-14090 is the parent issue for this project?
    > 4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming
    > Matteo's ideas): https://docs.google.com/document/d/
    > 10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#
    >
    >
    > On Fri, Feb 17, 2017 at 9:53 AM, Stack <stack@duboce.net> wrote:
    >
    > > I put up a hangout. If above link doesn't work, try this
    > > https://hangouts.google.com/call/aaahkufdurgctflufw4ivhsngue and write
    > > here if can't get in.
    > >
    > > St.Ack
    > >
    > > On Tue, Feb 14, 2017 at 12:36 PM, Stack <stack@duboce.net> wrote:
    > >
    > >> A few folks want to have a quick chat about the state of the proposed FS
    > >> redo project. The proposal is for 10AM, this Friday morning, PST. All
    > >> interested parties are invited to join (shout if 10AM PST is untenable
    > and
    > >> suggest an alternative). Below is a google hangout link that comes alive
    > >> friday morning [1].
    > >>
    > >> One of us will keep notes and post synopsis of discussion back here and
    > >> in issue after the meeting is done.
    > >>
    > >> Suggest those who join try to do some background reading -- see
    > >> HBASE-14439 -- so we are all around the same level of understanding when
    > >> the meeting starts. Agenda will be a basic intros, current state of the
    > >> project (with update on most recent effort), and then expectations.
    > Basic.
    > >>
    > >> Thanks,
    > >> S
    > >>
    > >> 1. https://plus.google.com/hangouts/_/calendar/c2FpbnQuYWNrQ
    > >> GdtYWlsLmNvbQ.1oaqlr00ru20s1hqrsq1q05j3k?authuser=0
    > >>
    > >
    > >
    >
    

Mime
View raw message