Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 80350200BB1 for ; Thu, 3 Nov 2016 22:58:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7EC1D160B0B; Thu, 3 Nov 2016 21:58:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9D244160AE5 for ; Thu, 3 Nov 2016 22:58:08 +0100 (CET) Received: (qmail 27533 invoked by uid 500); 3 Nov 2016 21:58:02 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 27120 invoked by uid 99); 3 Nov 2016 21:58:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2016 21:58:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1C554189AC6 for ; Thu, 3 Nov 2016 21:51:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.38 X-Spam-Level: ** X-Spam-Status: No, score=2.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id H3GAqgPX0L-0 for ; Thu, 3 Nov 2016 21:51:04 +0000 (UTC) Received: from mail-yw0-f175.google.com (mail-yw0-f175.google.com [209.85.161.175]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 24CA45FCFD for ; Thu, 3 Nov 2016 21:51:03 +0000 (UTC) Received: by mail-yw0-f175.google.com with SMTP id h14so68269564ywa.2 for ; Thu, 03 Nov 2016 14:51:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=/7lNXp2mwm1XFwmKE8WkCLv6PvwsL3eqVyAOlm5R/NY=; b=J1kq+FLvpXPUsdoU2LGDBqD11yVRrSBHeC17SqSH+hGsIjvW39mORsW3lBvKAhT2G/ vvn+0ABRrA4zbbwpRhYG8Qljf0bAyGi0KRTH5geODdKk3FZjauMLq21Wt8Rz8A2z68U1 3325SvWcXhyF1T4JaEi2ebVIFhoCW/wtc0C76X/HnTCQfKVU/swfYPsjX6kb1KLHJQZW hDky6Z5PoUGR5Scd1nXiKO+5BOr02whBzncZ0YVOz66T9AdSYDNK7LKhTO4GP8aT8/C8 P+FsoRZnVBsfviAEOcYMbaC5m7q0Hs3oFycvyiNniZMlGSj93lN9jmsk7Rqnep/do5oy nabg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=/7lNXp2mwm1XFwmKE8WkCLv6PvwsL3eqVyAOlm5R/NY=; b=LU67oRx6d1quLr5tGVvJJtAKwV3/hJq9y/eneZ526oi/dCv9vS1q8PfmqbuQCuNpQT 3kNJRb7HAgLc1AIAb3buRluTjoNWp7iVYEVM0qIB9xnV/3X8z/6SKqlJ6JIs9q0WgFYo yzISxmRFd4CrHMfokwfh8L5e/yxpg2ZuEwPJItRssYjTJ9OJLIZ5NhTpo74PuFnIxjaU EDst0SEaszsrG7Z9E2erQBkULnUr53WKoFXiv24WPspDbmN93VCt9kdHqHROBZMMkyMm ZD+OIi/dO5edh9dj3j3VCvcolDmahyAARawC6moO4QZNUDWsTHQpdAX9CKnY08ypZzag 2pIQ== X-Gm-Message-State: ABUngvcI/GaUevKYzK5IS7z0YkvpSHpeq1PP0u9M4EH9YlfY11bjo5xN4TiwBkcPTBwBrzsqO3oiYSxoN7owWQ== X-Received: by 10.129.179.73 with SMTP id r70mr9921292ywh.156.1478209853281; Thu, 03 Nov 2016 14:50:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.15.7 with HTTP; Thu, 3 Nov 2016 14:50:52 -0700 (PDT) In-Reply-To: <581B96F5.4040507@apache.org> References: <5813762F.2000307@apache.org> <58137C29.8040700@apache.org> <581A3B6D.9010504@apache.org> <581B96F5.4040507@apache.org> From: Ted Yu Date: Thu, 3 Nov 2016 14:50:52 -0700 Message-ID: Subject: Re: [DISCUSS] FileSystem Quotas in HBase To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=94eb2c146d7e3574ee05406c8d29 archived-at: Thu, 03 Nov 2016 21:58:09 -0000 --94eb2c146d7e3574ee05406c8d29 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks, Josh. Looking forward to the patches. On Thu, Nov 3, 2016 at 12:58 PM, Josh Elser wrote: > Done. > > Ted Yu wrote: > >> Josh: >> Please capture the following in design doc. >> >> Thanks >> >> On Wed, Nov 2, 2016 at 3:28 PM, Enis S=C3=B6ztutar = wrote: >> >> Thanks Andrew, >>> >>> I forgot to mention that we have considered using the HDFS quota >>> enforcement directly as well, but decided against it for a couple of >>> reasons. >>> - Our current layout has files in the data directory, as well as >>> archive >>> directory and WALs, etc. Since there is no option for HDFS quotas to sp= an >>> multiple directories, we can only use the HDFS quotas for main data >>> files, >>> and not snapshots, etc unless we do major surgery in our file layouts. >>> This >>> will get more complicated if we want to do flat layout, etc later on. >>> - Since WALs would not be in any namespace unless we do >>> wal-per-namespace, >>> that means that once a single NS's HDFS quota is reached, it might affe= ct >>> everybody else and potentially cause havoc on the cluster. The problem >>> would be that if a single NS is out of space, we cannot perform flushes >>> at >>> all. This would cause the WALs to be backed up and kept forever and >>> affect >>> all of the other regions from different tables / namespaces causing >>> unavailability for unrelated tables. Wal-per-namespace also has to be >>> implemented and WALs be moved under a shared NS directory to share the >>> data >>> and WAL requiring further layout changes. It also will not be optimal i= f >>> there is a large number of namespaces. >>> - Will only work with HDFS, while HBase can use other file systems. >>> >>> Enis >>> >>> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell >>> wrote: >>> >>> Another approach to hard limits could be pushing the quota down to the >>>> >>> HDFS >>> >>>> level, because HDFS would have a very accurate assessment of quota >>>> utilization at all times, but this would only work with HDFS and impos= e >>>> limits on how HBase structures storage on the filesystem (e.g. all fil= es >>>> for a namespace must be under a common root). Still, implementation >>>> would >>>> be "easy": over hard quota, all allocations would fail, the bulk of th= e >>>> effort is hardening response to allocation failures. >>>> >>>> On Wed, Nov 2, 2016 at 1:11 PM, Enis S=C3=B6ztutar w= rote: >>>> >>>> Thanks Josh for the doc and pursuing this. >>>>> >>>>> I was involved with some of the design choices so consider me a +1 on >>>>> >>>> the >>> >>>> general approach. One topic which is not covered here is that the othe= r >>>>> design decision that we could have pursued is a more strict control o= n >>>>> >>>> the >>>> >>>>> quota usage so that we would always guarantee that the namespace / >>>>> >>>> table >>> >>>> cannot use more than allocated disk space. This hard-limit approach >>>>> >>>> would >>> >>>> differ from the proposed "soft-limit" approach because the soft limit >>>>> approach can end up overusing the disk space by a small amount (becau= se >>>>> >>>> it >>>> >>>>> takes time to detect the quota limit is reached and enforcing of the >>>>> limit). >>>>> >>>>> The hard-limit approach maybe built by doing a lease kind of mechanis= m >>>>> where the master gives away disk space leases to region servers from >>>>> >>>> the >>> >>>> remaining limit, and the regionservers make sure that they cannot >>>>> >>>> allocate >>>> >>>>> more space than the lease dictates. By ensuring that the space is >>>>> pre-allocated via leases, we can always make sure that strict limits >>>>> >>>> are >>> >>>> applied. Though, this approach would be harder to build and stabilize >>>>> because it will need new mechanisms for distributing and managing thi= s >>>>> >>>> kind >>>> >>>>> of leases as well as tuning the allocations to make sure that >>>>> >>>> regionservers >>>> >>>>> never block flushes or compactions due to lack of lease in time would >>>>> >>>> prove >>>> >>>>> challenging to get it right. >>>>> >>>>> We generally think that the "soft-limit" approach would be a good >>>>> >>>> enough >>> >>>> approximation and the error bounds on over-allocation would be minimal >>>>> >>>> and >>>> >>>>> negligible in production. Thus, the proposal is to implement the sof= t >>>>> approach with good documentation about how much space can be >>>>> >>>> over-allocated >>>> >>>>> in a worst-case scenario. >>>>> >>>>> Enis >>>>> >>>>> On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser wrote= : >>>>> >>>>> Thanks for the reviews so far, Ted and Stack. The comments were great >>>>>> >>>>> and >>>> >>>>> much appreciated. >>>>>> >>>>>> Interpreting consensus from lack of objection, I'm going to move >>>>>> >>>>> ahead >>> >>>> in >>>> >>>>> earnest starting to work on what was described in the doc. Expect to >>>>>> >>>>> see >>>> >>>>> some work break-out happening under HBASE-16961 and patches starting >>>>>> >>>>> to >>> >>>> land. >>>>>> >>>>>> I'm also happy to entertain more discussion if anyone hasn't found >>>>>> >>>>> the >>> >>>> time to read/comment yet. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> - Josh >>>>>> >>>>>> >>>>>> Josh Elser wrote: >>>>>> >>>>>> Sure thing, Ted. >>>>>>> >>>>>>> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO >>>>>>> eecF-YA2FYSK3TSs_bw/edit?usp=3Dsharing >>>>>>> >>>>>>> >>>>>>> Let me open an umbrella issue for now. I can break up the work >>>>>>> >>>>>> later. >>> >>>> https://issues.apache.org/jira/browse/HBASE-16961 >>>>>>> >>>>>>> Ted Yu wrote: >>>>>>> >>>>>>> Josh: >>>>>>>> Can you put the doc in google doc so that people can comment on it >>>>>>>> >>>>>>> ? >>> >>>> Is there a JIRA opened for this work ? >>>>>>>> Please open one if there is none. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser >>>>>>>> >>>>>>> wrote: >>>> >>>>> Hi folks, >>>>>>>> >>>>>>>>> I'd like to propose the introduction of FileSystem quotas to >>>>>>>>> >>>>>>>> HBase. >>> >>>> Here's a design doc[1] available which (hopefully) covers all of >>>>>>>>> >>>>>>>> the >>> >>>> salient points of what I think an initial version of such a >>>>>>>>> >>>>>>>> feature >>> >>>> would >>>>>>>>> include. >>>>>>>>> >>>>>>>>> tl;dr We can define quotas on tables and namespaces. Region size >>>>>>>>> >>>>>>>> is >>> >>>> computed by RegionServers and sent to the Master. The Master >>>>>>>>> >>>>>>>> inspects >>>> >>>>> the >>>>>>>>> sizes of Regions, rolling up to table and namespace sizes. Define= d >>>>>>>>> quotas >>>>>>>>> in the quota table are evaluated given the computed sizes, and, >>>>>>>>> >>>>>>>> for >>> >>>> those >>>>>>>>> tables/namespaces violating the quota, RegionServers are informed >>>>>>>>> >>>>>>>> to >>> >>>> take >>>>>>>>> some action to limit any further filesystem growth by that >>>>>>>>> table/namespace. >>>>>>>>> >>>>>>>>> I'd encourage you to give the document a read -- I tried to cover >>>>>>>>> >>>>>>>> as >>> >>>> much >>>>>>>>> as I could without getting unnecessarily bogged down in >>>>>>>>> >>>>>>>> implementation >>>> >>>>> details. >>>>>>>>> >>>>>>>>> Feedback is, of course, welcomed. I'd like to start sketching out >>>>>>>>> >>>>>>>> a >>> >>>> breakdown of the work (all writing and no programming makes Josh a >>>>>>>>> >>>>>>>> sad >>>> >>>>> boy). I'm happy to field any/all questions. Thanks in advance. >>>>>>>>> >>>>>>>>> - Josh >>>>>>>>> >>>>>>>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac >>>>>>>>> heHBase.pdf >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet He= in >>>> (via Tom White) >>>> >>>> >> --94eb2c146d7e3574ee05406c8d29--