hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSS] FileSystem Quotas in HBase
Date Thu, 03 Nov 2016 15:51:52 GMT
Great points. Thanks fellas!

Andrew Purtell wrote:
> Right, these drawbacks are what I was getting at with "​impose limits on
> how HBase structures storage on the filesystem". It does imply major
> changes to filesystem structure and multiple WALs. I didn't think of
> snapshots, you are right that makes filesystem reorganization more
> complicated. To your earlier point I think soft quotas are a fine start as
> well.
> On Wed, Nov 2, 2016 at 3:28 PM, Enis Söztutar<enis.soz@gmail.com>  wrote:
>> Thanks Andrew,
>> I forgot to mention that we have considered using the HDFS quota
>> enforcement directly as well, but decided against it for a couple of
>> reasons.
>>   - Our current layout has files in the data directory, as well as archive
>> directory and WALs, etc. Since there is no option for HDFS quotas to span
>> multiple directories, we can only use the HDFS quotas for main data files,
>> and not snapshots, etc unless we do major surgery in our file layouts. This
>> will get more complicated if we want to do flat layout, etc later on.
>>   - Since WALs would not be in any namespace unless we do wal-per-namespace,
>> that means that once a single NS's HDFS quota is reached, it might affect
>> everybody else and potentially cause havoc on the cluster. The problem
>> would be that if a single NS is out of space, we cannot perform flushes at
>> all. This would cause the WALs to be backed up and kept forever and affect
>> all of the other regions from different tables / namespaces causing
>> unavailability for unrelated tables. Wal-per-namespace also has to be
>> implemented and WALs be moved under a shared NS directory to share the data
>> and WAL requiring further layout changes. It also will not be optimal if
>> there is a large number of namespaces.
>>   - Will only work with HDFS, while HBase can use other file systems.
>> Enis
>> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell<apurtell@apache.org>
>> wrote:
>>> Another approach to hard limits could be pushing the quota down to the
>>> level, because HDFS would have a very accurate assessment of quota
>>> utilization at all times, but this would only work with HDFS and
>> ​​
>> impose
>>> limits on how HBase structures storage on the filesystem (e.g. all files
>>> for a namespace must be under a common root). Still, implementation would
>>> be "easy": over hard quota, all allocations would fail, the bulk of the
>>> effort is hardening response to allocation failures.
>>> On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar<enis@apache.org>  wrote:
>>>> Thanks Josh for the doc and pursuing this.
>>>> I was involved with some of the design choices so consider me a +1 on
>> the
>>>> general approach. One topic which is not covered here is that the other
>>>> design decision that we could have pursued is a more strict control on
>>> the
>>>> quota usage so that we would always guarantee that the namespace /
>> table
>>>> cannot use more than allocated disk space. This hard-limit approach
>> would
>>>> differ from the proposed "soft-limit" approach because the soft limit
>>>> approach can end up overusing the disk space by a small amount (because
>>> it
>>>> takes time to detect the quota limit is reached and enforcing of the
>>>> limit).
>>>> The hard-limit approach maybe built by doing a lease kind of mechanism
>>>> where the master gives away disk space leases to region servers from
>> the
>>>> remaining limit, and the regionservers make sure that they cannot
>>> allocate
>>>> more space than the lease dictates. By ensuring that the space is
>>>> pre-allocated via leases, we can always make sure that strict limits
>> are
>>>> applied. Though, this approach would be harder to build and stabilize
>>>> because it will need new mechanisms for distributing and managing this
>>> kind
>>>> of leases as well as tuning the allocations to make sure that
>>> regionservers
>>>> never block flushes or compactions due to lack of lease in time would
>>> prove
>>>> challenging to get it right.
>>>> We generally think that the "soft-limit" approach would be a good
>> enough
>>>> approximation and the error bounds on over-allocation would be minimal
>>> and
>>>> negligible in production.  Thus, the proposal is to implement the soft
>>>> approach with good documentation about how much space can be
>>> over-allocated
>>>> in a worst-case scenario.
>>>> Enis
>>>> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser<elserj@apache.org>  wrote:
>>>>> Thanks for the reviews so far, Ted and Stack. The comments were great
>>> and
>>>>> much appreciated.
>>>>> Interpreting consensus from lack of objection, I'm going to move
>> ahead
>>> in
>>>>> earnest starting to work on what was described in the doc. Expect to
>>> see
>>>>> some work break-out happening under HBASE-16961 and patches starting
>> to
>>>>> land.
>>>>> I'm also happy to entertain more discussion if anyone hasn't found
>> the
>>>>> time to read/comment yet.
>>>>> Thanks!
>>>>> - Josh
>>>>> Josh Elser wrote:
>>>>>> Sure thing, Ted.
>>>>>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
>>>>>> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
>>>>>> Let me open an umbrella issue for now. I can break up the work
>> later.
>>>>>> https://issues.apache.org/jira/browse/HBASE-16961
>>>>>> Ted Yu wrote:
>>>>>>> Josh:
>>>>>>> Can you put the doc in google doc so that people can comment
on it
>> ?
>>>>>>> Is there a JIRA opened for this work ?
>>>>>>> Please open one if there is none.
>>>>>>> Thanks
>>>>>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<elserj@apache.org>
>>> wrote:
>>>>>>> Hi folks,
>>>>>>>> I'd like to propose the introduction of FileSystem quotas
>> HBase.
>>>>>>>> Here's a design doc[1] available which (hopefully) covers
all of
>> the
>>>>>>>> salient points of what I think an initial version of such
>> feature
>>>>>>>> would
>>>>>>>> include.
>>>>>>>> tl;dr We can define quotas on tables and namespaces. Region
>> is
>>>>>>>> computed by RegionServers and sent to the Master. The Master
>>> inspects
>>>>>>>> the
>>>>>>>> sizes of Regions, rolling up to table and namespace sizes.
>>>>>>>> quotas
>>>>>>>> in the quota table are evaluated given the computed sizes,
>> for
>>>>>>>> those
>>>>>>>> tables/namespaces violating the quota, RegionServers are
>> to
>>>>>>>> take
>>>>>>>> some action to limit any further filesystem growth by that
>>>>>>>> table/namespace.
>>>>>>>> I'd encourage you to give the document a read -- I tried
to cover
>> as
>>>>>>>> much
>>>>>>>> as I could without getting unnecessarily bogged down in
>>> implementation
>>>>>>>> details.
>>>>>>>> Feedback is, of course, welcomed. I'd like to start sketching
>> a
>>>>>>>> breakdown of the work (all writing and no programming makes
Josh a
>>> sad
>>>>>>>> boy). I'm happy to field any/all questions. Thanks in advance.
>>>>>>>> - Josh
>>>>>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
>>>>>>>> heHBase.pdf
>>> --
>>> Best regards,
>>>     - Andy
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)

View raw message