hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Online Meeting on embryonic FS Redo Project (HBASE-14439)
Date Fri, 17 Feb 2017 20:57:46 GMT
Notes from this morning's online meeting @10AM PST (please fill in any
detail I missed):

IN ATTENDANCE:
Aman Poonia
Umesh Agashe, Cloudera
Stephen Tak, AMZ
Zach York, AMZ
Francis Liu, Yahoo!
Ben Mau, Yahoo!
Sean Busbey, Cloudera
Ted Yu, HWX
Appy (Apekshit Sharma), Cloudera


BACKGROUND (St.Ack)
Y! want to do millions of regions in a Cluster.
Our current FS Layout heavily dependent on HDFS semantics (e.g. we depend
heavily on HDFS rename doing atomic file and directory swaps); complicates
being able to run on another FS.
HBase is bound to a particular physical layout in the FS.
Matteo Bertozzi experience with HDFS/HBase on S3 and a general irritation
with how FS ops are distributed all about the codebase had him propose a
logical tier with a radically simplified set of requirements of underlying
FS (block store?); atomic operations would be done by HBase rather than
farmed out to the FS.
Matteo not w/ us anymore but he passed on the vision to Umesh

CURRENT STATE OF FS REDO PROJECT (Umesh)
Currently it is shelved but hope to get back to it 'soon'.
Spent a few months on FS REDO at end of last year.
Initial approach was to abstract out three Interfaces (original sketched by
Matteo in [1]).
Idea was to centralize all FS use in a few well-known locations.
Then refactor all FS usage.
Keep all meta data about tables, files, etc., in hbase:meta
Idea was to slowly migrate over ops, tools etc., to the new Interface.
This was a bottom-up approach, finding FS references, and moving references
to one place.
Soon found too many refs all over the code.
Found that we might not get to desired simple Interface because API had to
carry around baggage.
Matteo had tried this approach in [1] and started to argue this stepped
migration would never arrive.

So restarted over w/ the ideal Simple FS Interface and the implementation
seemed to flow smoothly.
An in-memory POC that did simple file ops was posted a while back here [2].

Given the two approaches taken above, experience indicates that the
radical, top-down approach is more likely to succeed.

WHY ARE PEOPLE INTERESTED IN FS REDO?
Francis and Ben Mau, we want to be able to do 1M regions.
St.Ack suggested that even small installs need to be able to do more,
smaller regions.
Zach is interested because wants to optimize HBase over S3 (rename,
consistency issues). Liked the idea of metadata up in hbase;meta table and
avoiding renames, etc.

WHAT SHOULD WE DO?
We have few resources. It is a big job (We've been talking about it a good
while now). All docs are stale missing benefit of Umesh recent explorations.
Sean pointed out that before shelving, the idea was to try the PoC
Interface against a new hbase operation other than simple file reading and
writing (compactions?). If the PoC Interface survived in the new context,
we'd then step back and write up a design.
Seemed like as good a plan as any. Plan should talk about all the ways in
which ops can go wrong.
Thereafter, split up the work and bring over subsystems.
It is looking like hbase3 rather than hbase2 project (though all hoped it
could make an hbase2).

TODOs
We agreed to post these notes with pointers to current state of FS REDO
(See below).
Umesh and Stack to do up a one-pager on current PoC to be posted on this
thread or up in the FS REDO issue (HBASE-14090).
Keep up macro status on this thread.

What else?
Thanks,
S

1. Matteo's original FS REDO suggested plan: https://docs.google.com/
document/d/1fMDanYiDAWpfKLcKUBb1Ff0BwB7zeGqzbyTFMvSOacQ/edit#
2. Umesh's PoC: https://reviews.apache.org/r/55200/
3. HBASE-14090 is the parent issue for this project?
4. An old doc. to evangelize the idea of an FS REDO (mostly upstreaming
Matteo's ideas): https://docs.google.com/document/d/
10tSCSSWPwdFqOLLYtY2aVFe6iCIrsBk4Vqm8LSGUfhQ/edit#


On Fri, Feb 17, 2017 at 9:53 AM, Stack <stack@duboce.net> wrote:

> I put up a hangout. If above link doesn't work, try this
> https://hangouts.google.com/call/aaahkufdurgctflufw4ivhsngue and write
> here if can't get in.
>
> St.Ack
>
> On Tue, Feb 14, 2017 at 12:36 PM, Stack <stack@duboce.net> wrote:
>
>> A few folks want to have a quick chat about the state of the proposed FS
>> redo project. The proposal is for 10AM, this Friday morning, PST. All
>> interested parties are invited to join (shout if 10AM PST is untenable and
>> suggest an alternative). Below is a google hangout link that comes alive
>> friday morning [1].
>>
>> One of us will keep notes and post synopsis of discussion back here and
>> in issue after the meeting is done.
>>
>> Suggest those who join try to do some background reading -- see
>> HBASE-14439 -- so we are all around the same level of understanding when
>> the meeting starts. Agenda will be a basic intros, current state of the
>> project (with update on most recent effort), and then expectations. Basic.
>>
>> Thanks,
>> S
>>
>> 1. https://plus.google.com/hangouts/_/calendar/c2FpbnQuYWNrQ
>> GdtYWlsLmNvbQ.1oaqlr00ru20s1hqrsq1q05j3k?authuser=0
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message