Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: <581B96F5.4040507@apache.org>
References: <5813762F.2000307@apache.org> <CALte62yE-k7azJ0ryjcLodCKrchyPbxL_Ovywt4qTZ1V9ZG-VA@mail.gmail.com>
 <58137C29.8040700@apache.org> <581A3B6D.9010504@apache.org>
 <CAMUu0w8hD-JTLTjR0ONNZnWJUTBJGOz5adeYJZr4xrL4jE=-0w@mail.gmail.com>
 <CA+RK=_BcumaYibPccDPFrBjFTEz8KSNEggTLEexuH82aCbSOSg@mail.gmail.com>
 <CAMUu0w_jn4KLMmSqeash3e8iqq8cLgUhSavZyQv7xXttYKnF8g@mail.gmail.com>
 <CALte62xaBF2_7jPo8hrSPfa85DEXjusoQ7eSvivFPZZRR-togw@mail.gmail.com> <581B96F5.4040507@apache.org>
From: Ted Yu <yuzhihong@gmail.com>
Date: Thu, 3 Nov 2016 14:50:52 -0700
Message-ID: <CALte62xf4wh-Qr6QUuv9WstoQkDk0X_-BgXvSBhcaxyXFzWd9w@mail.gmail.com>
Subject: Re: [DISCUSS] FileSystem Quotas in HBase
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Content-Type: multipart/alternative; boundary=94eb2c146d7e3574ee05406c8d29
archived-at: Thu, 03 Nov 2016 21:58:09 -0000

--94eb2c146d7e3574ee05406c8d29
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks, Josh.

Looking forward to the patches.

On Thu, Nov 3, 2016 at 12:58 PM, Josh Elser <elserj@apache.org> wrote:

> Done.
>
> Ted Yu wrote:
>
>> Josh:
>> Please capture the following in design doc.
>>
>> Thanks
>>
>> On Wed, Nov 2, 2016 at 3:28 PM, Enis S=C3=B6ztutar<enis.soz@gmail.com>  =
wrote:
>>
>> Thanks Andrew,
>>>
>>> I forgot to mention that we have considered using the HDFS quota
>>> enforcement directly as well, but decided against it for a couple of
>>> reasons.
>>>   - Our current layout has files in the data directory, as well as
>>> archive
>>> directory and WALs, etc. Since there is no option for HDFS quotas to sp=
an
>>> multiple directories, we can only use the HDFS quotas for main data
>>> files,
>>> and not snapshots, etc unless we do major surgery in our file layouts.
>>> This
>>> will get more complicated if we want to do flat layout, etc later on.
>>>   - Since WALs would not be in any namespace unless we do
>>> wal-per-namespace,
>>> that means that once a single NS's HDFS quota is reached, it might affe=
ct
>>> everybody else and potentially cause havoc on the cluster. The problem
>>> would be that if a single NS is out of space, we cannot perform flushes
>>> at
>>> all. This would cause the WALs to be backed up and kept forever and
>>> affect
>>> all of the other regions from different tables / namespaces causing
>>> unavailability for unrelated tables. Wal-per-namespace also has to be
>>> implemented and WALs be moved under a shared NS directory to share the
>>> data
>>> and WAL requiring further layout changes. It also will not be optimal i=
f
>>> there is a large number of namespaces.
>>>   - Will only work with HDFS, while HBase can use other file systems.
>>>
>>> Enis
>>>
>>> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell<apurtell@apache.org>
>>> wrote:
>>>
>>> Another approach to hard limits could be pushing the quota down to the
>>>>
>>> HDFS
>>>
>>>> level, because HDFS would have a very accurate assessment of quota
>>>> utilization at all times, but this would only work with HDFS and impos=
e
>>>> limits on how HBase structures storage on the filesystem (e.g. all fil=
es
>>>> for a namespace must be under a common root). Still, implementation
>>>> would
>>>> be "easy": over hard quota, all allocations would fail, the bulk of th=
e
>>>> effort is hardening response to allocation failures.
>>>>
>>>> On Wed, Nov 2, 2016 at 1:11 PM, Enis S=C3=B6ztutar<enis@apache.org>  w=
rote:
>>>>
>>>> Thanks Josh for the doc and pursuing this.
>>>>>
>>>>> I was involved with some of the design choices so consider me a +1 on
>>>>>
>>>> the
>>>
>>>> general approach. One topic which is not covered here is that the othe=
r
>>>>> design decision that we could have pursued is a more strict control o=
n
>>>>>
>>>> the
>>>>
>>>>> quota usage so that we would always guarantee that the namespace /
>>>>>
>>>> table
>>>
>>>> cannot use more than allocated disk space. This hard-limit approach
>>>>>
>>>> would
>>>
>>>> differ from the proposed "soft-limit" approach because the soft limit
>>>>> approach can end up overusing the disk space by a small amount (becau=
se
>>>>>
>>>> it
>>>>
>>>>> takes time to detect the quota limit is reached and enforcing of the
>>>>> limit).
>>>>>
>>>>> The hard-limit approach maybe built by doing a lease kind of mechanis=
m
>>>>> where the master gives away disk space leases to region servers from
>>>>>
>>>> the
>>>
>>>> remaining limit, and the regionservers make sure that they cannot
>>>>>
>>>> allocate
>>>>
>>>>> more space than the lease dictates. By ensuring that the space is
>>>>> pre-allocated via leases, we can always make sure that strict limits
>>>>>
>>>> are
>>>
>>>> applied. Though, this approach would be harder to build and stabilize
>>>>> because it will need new mechanisms for distributing and managing thi=
s
>>>>>
>>>> kind
>>>>
>>>>> of leases as well as tuning the allocations to make sure that
>>>>>
>>>> regionservers
>>>>
>>>>> never block flushes or compactions due to lack of lease in time would
>>>>>
>>>> prove
>>>>
>>>>> challenging to get it right.
>>>>>
>>>>> We generally think that the "soft-limit" approach would be a good
>>>>>
>>>> enough
>>>
>>>> approximation and the error bounds on over-allocation would be minimal
>>>>>
>>>> and
>>>>
>>>>> negligible in production.  Thus, the proposal is to implement the sof=
t
>>>>> approach with good documentation about how much space can be
>>>>>
>>>> over-allocated
>>>>
>>>>> in a worst-case scenario.
>>>>>
>>>>> Enis
>>>>>
>>>>> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser<elserj@apache.org>  wrote=
:
>>>>>
>>>>> Thanks for the reviews so far, Ted and Stack. The comments were great
>>>>>>
>>>>> and
>>>>
>>>>> much appreciated.
>>>>>>
>>>>>> Interpreting consensus from lack of objection, I'm going to move
>>>>>>
>>>>> ahead
>>>
>>>> in
>>>>
>>>>> earnest starting to work on what was described in the doc. Expect to
>>>>>>
>>>>> see
>>>>
>>>>> some work break-out happening under HBASE-16961 and patches starting
>>>>>>
>>>>> to
>>>
>>>> land.
>>>>>>
>>>>>> I'm also happy to entertain more discussion if anyone hasn't found
>>>>>>
>>>>> the
>>>
>>>> time to read/comment yet.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> - Josh
>>>>>>
>>>>>>
>>>>>> Josh Elser wrote:
>>>>>>
>>>>>> Sure thing, Ted.
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
>>>>>>> eecF-YA2FYSK3TSs_bw/edit?usp=3Dsharing
>>>>>>>
>>>>>>>
>>>>>>> Let me open an umbrella issue for now. I can break up the work
>>>>>>>
>>>>>> later.
>>>
>>>> https://issues.apache.org/jira/browse/HBASE-16961
>>>>>>>
>>>>>>> Ted Yu wrote:
>>>>>>>
>>>>>>> Josh:
>>>>>>>> Can you put the doc in google doc so that people can comment on it
>>>>>>>>
>>>>>>> ?
>>>
>>>> Is there a JIRA opened for this work ?
>>>>>>>> Please open one if there is none.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<elserj@apache.org>
>>>>>>>>
>>>>>>> wrote:
>>>>
>>>>> Hi folks,
>>>>>>>>
>>>>>>>>> I'd like to propose the introduction of FileSystem quotas to
>>>>>>>>>
>>>>>>>> HBase.
>>>
>>>> Here's a design doc[1] available which (hopefully) covers all of
>>>>>>>>>
>>>>>>>> the
>>>
>>>> salient points of what I think an initial version of such a
>>>>>>>>>
>>>>>>>> feature
>>>
>>>> would
>>>>>>>>> include.
>>>>>>>>>
>>>>>>>>> tl;dr We can define quotas on tables and namespaces. Region size
>>>>>>>>>
>>>>>>>> is
>>>
>>>> computed by RegionServers and sent to the Master. The Master
>>>>>>>>>
>>>>>>>> inspects
>>>>
>>>>> the
>>>>>>>>> sizes of Regions, rolling up to table and namespace sizes. Define=
d
>>>>>>>>> quotas
>>>>>>>>> in the quota table are evaluated given the computed sizes, and,
>>>>>>>>>
>>>>>>>> for
>>>
>>>> those
>>>>>>>>> tables/namespaces violating the quota, RegionServers are informed
>>>>>>>>>
>>>>>>>> to
>>>
>>>> take
>>>>>>>>> some action to limit any further filesystem growth by that
>>>>>>>>> table/namespace.
>>>>>>>>>
>>>>>>>>> I'd encourage you to give the document a read -- I tried to cover
>>>>>>>>>
>>>>>>>> as
>>>
>>>> much
>>>>>>>>> as I could without getting unnecessarily bogged down in
>>>>>>>>>
>>>>>>>> implementation
>>>>
>>>>> details.
>>>>>>>>>
>>>>>>>>> Feedback is, of course, welcomed. I'd like to start sketching out
>>>>>>>>>
>>>>>>>> a
>>>
>>>> breakdown of the work (all writing and no programming makes Josh a
>>>>>>>>>
>>>>>>>> sad
>>>>
>>>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>>>>>>>
>>>>>>>>> - Josh
>>>>>>>>>
>>>>>>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>>>>>>>> heHBase.pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>>     - Andy
>>>>
>>>> Problems worthy of attack prove their worth by hitting back. - Piet He=
in
>>>> (via Tom White)
>>>>
>>>>
>>

--94eb2c146d7e3574ee05406c8d29--