From user-return-1453-archive-asf-public=cust-asf.ponee.io@kudu.apache.org Fri Aug 3 02:37:07 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 293B7180629 for ; Fri, 3 Aug 2018 02:37:05 +0200 (CEST) Received: (qmail 59378 invoked by uid 500); 3 Aug 2018 00:37:05 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 59368 invoked by uid 99); 3 Aug 2018 00:37:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2018 00:37:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 36424CE636 for ; Fri, 3 Aug 2018 00:37:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.899 X-Spam-Level: * X-Spam-Status: No, score=1.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 6F9fLakn8ErP for ; Fri, 3 Aug 2018 00:37:02 +0000 (UTC) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 67EF45F3EE for ; Fri, 3 Aug 2018 00:37:02 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id j143-v6so2825266lfj.12 for ; Thu, 02 Aug 2018 17:37:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=vmfMQrjwuJfwlkK3nOgHMkMp4y50s7KkGFSwAVtSAi0=; b=fiP4WpUAoTQSuqgVIf/4wfQ7FSRXb6aiAcn/71xPaf9mXmZNBFBqCGw3+7D+wGx3KK lDM/x+qcZ8H4obHaqXeMoUkuDmT7EfbJgXyoLRFp0o1EoH27OrIlSiEkti4IJiRmNlEr yMsGGfm8jtTFbTY+fjNvR568DQmjW1sNMK28kjqvJmb5+qu4UnQZsIOrEtHma9gTtOTq ftqjxaO5UPQ/O2v3TyJOvd1aTJvzMiZ2vt+lT+guESWkubvQcXttD4Wo5uABQXmEDl8d FwdoY4fy0gAhYEh98Kr+KHEZtVX8ukRi7jQ+8sUkyspeMXbu+UiW/SpYW59kVE1l4PEn S/TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=vmfMQrjwuJfwlkK3nOgHMkMp4y50s7KkGFSwAVtSAi0=; b=eYx6XwBa7CJXZXq9ibGsua1On29gWSlU65C3uZ8rTSW2Vyveis8icQqv9xIWiy5pd5 YEzW4IyCrwOsNqlhCSzZmkdrHg9ROi5eZSztiXe3phD+H5dTeUyua+C27xLUzHwqZ27X DNz8Ej47KoD/mLPQNyw6Vk0afmP+kFOEQDkVY/S5YZIMXG4i0DgaRdMdCx+eFNjIA7rv a7iHSxfrL6PnXrbPiNQgivCvdSJwUM3yjZlLtITscNwrP6bdKThQiloqKJIDzCHbb1pP ZzQEDaxBR/8jUSgVnMod2Ku1SJTyI9lyRlmwkAuWa37u/5McwrZSlFReTUXZ6J7VGu36 WgwA== X-Gm-Message-State: AOUpUlG1t+hC0fudF8+oCN4PL98YkGvBKBMzd1Sle63lfmiRVhMaHOwz 9ysmd7qndua/X8OJt10UTOaiJbsxH/b0zZcXLEhWzpJS X-Google-Smtp-Source: AAOMgpf5nfPpHhLX84slcMDK7hGTvNhkpqfVVqt/j0zAY4XiUxOuBbcNScPPu9vtpn8LFnzBzqbIQeUvY84gLxbb4FE= X-Received: by 2002:ac2:418b:: with SMTP id z11-v6mr3244555lfh.3.1533256620993; Thu, 02 Aug 2018 17:37:00 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:590c:0:0:0:0:0 with HTTP; Thu, 2 Aug 2018 17:36:39 -0700 (PDT) In-Reply-To: <461e5d4e.627.164fd10d8eb.Coremail.huang_quanlong@126.com> References: <5f39c4c2.9f65.164fb054418.Coremail.huang_quanlong@126.com> <461e5d4e.627.164fd10d8eb.Coremail.huang_quanlong@126.com> From: Todd Lipcon Date: Thu, 2 Aug 2018 17:36:39 -0700 Message-ID: Subject: Re: Re: Recommended maximum amount of stored data per tablet server To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="0000000000003e994805727d2036" --0000000000003e994805727d2036 Content-Type: text/plain; charset="UTF-8" On Thu, Aug 2, 2018 at 4:54 PM, Quanlong Huang wrote: > Thank Adar and Todd! We'd like to contribute when we could. > > Are there any concerns if we share the machines with HDFS DataNodes and > Yarn NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they > don't share the same disks, e.g. 4 disks for kudu and the other 11 disks > for DataNode and NodeManager, and leave enough CPU & mem for kudu. Is that > right? > That should be fine. Typically we actualyl recommend sharing all the disks for all of the services. There is a trade-off between static partitioning (exclusive access to a smaller number of disks) vs dynamic sharing (potential contention but more available resources). Unless your workload is very latency sensitive I usually think it's better to have the bigger pool of resources available even if it needs to share with other systems. One recommendation, though is to consider using a dedicated disk for the Kudu WAL and metadata, which can help performance, since the WAL can be sensitive to other heavy workloads monopolizing bandwidth on the same spindle. -Todd > > At 2018-08-03 02:26:37, "Todd Lipcon" wrote: > > +1 to what Adar said. > > One tension we have currently for scaling is that we don't want to scale > individual tablets too large, because of problems like the superblock that > Adar mentioned. However, the solution of just having more tablets is also > not a great one, since many of our startup time problems are primarily > affected by the number of tablets more than their size (see KUDU-38 as the > prime, ancient, example). Additionally, having lots of tablets increases > raft heartbeat traffic and may need to dial back those heartbeat intervals > to keep things stable. > > All of these things can be addressed in time and with some work. If you > are interested in working on these areas to improve density that would be a > great contribution. > > -Todd > > > > On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo > wrote: > >> The 8TB limit isn't a hard one, it's just a reflection of the scale >> that Kudu developers commonly test. Beyond 8TB we can't vouch for >> Kudu's stability and performance. For example, we know that as the >> amount of on-disk data grows, node restart times get longer and longer >> (see KUDU-2014 for some ideas on how to improve that). Furthermore, as >> tablets accrue more data blocks, their superblocks become larger, >> raising the minimum amount of I/O for any operation that rewrites a >> superblock (such as a flush or compaction). Lastly, the tablet copy >> protocol used in rereplication tries to copy the entire superblock in >> one RPC message; if the superblock is too large, it'll run up against >> the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc). >> >> These examples are just off the top of my head; there may be others >> lurking. So this goes back to what I led with: beyond the recommended >> limit we aren't quite sure how Kudu's performance and stability are >> affected. >> >> All that said, you're welcome to try it out and report back with your >> findings. >> >> >> On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang >> wrote: >> > >> > Hi all, >> > >> > In the document of "Known Issues and Limitations", it's recommended >> that "maximum amount of stored data, post-replication and post-compression, >> per tablet server is 8TB". How is the 8TB calculated? >> > >> > We have some machines each with 15 * 4TB spinning disk drives and 256GB >> RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is >> recommended to leave for other systems? We prefer to make the machine >> dedicated to Kudu. Can tablet server leverage the whole space efficiently? >> > >> > Thanks, >> > Quanlong >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera --0000000000003e994805727d2036 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


One recommendation, though is to consider using a dedicated disk for the K= udu WAL and metadata, which can help performance, since the WAL can be sens= itive to other heavy workloads monopolizing bandwidth on the same spindle.<= /div>

-Todd
=

At 2018-08-03 02:26:37, "To= dd Lipcon" <
= todd@cloudera.com> wrote:
+1 to what Adar = said.

One tension we have currently for scaling is that = we don't want to scale individual tablets too large, because of problem= s like the superblock that Adar mentioned. However, the solution of just ha= ving more tablets is also not a great one, since many of our startup time p= roblems are primarily affected by the number of tablets more than their siz= e (see KUDU-38 as the prime, ancient, example). Additionally, having lots o= f tablets increases raft heartbeat traffic and may need to dial back those = heartbeat intervals to keep things stable.

All of = these things can be addressed in time and with some work. If you are intere= sted in working on these areas to improve density that would be a great con= tribution.

-Todd



On Thu, A= ug 2, 2018 at 11:17 AM, Adar Lieber-Dembo <adar@cloudera.com> wrote:
The 8TB limit isn't a hard o= ne, it's just a reflection of the scale
that Kudu developers commonly test. Beyond 8TB we can't vouch for
Kudu's stability and performance. For example, we know that as the
amount of on-disk data grows, node restart times get longer and longer
(see KUDU-2014 for some ideas on how to improve that). Furthermore, as
tablets accrue more data blocks, their superblocks become larger,
raising the minimum amount of I/O for any operation that rewrites a
superblock (such as a flush or compaction). Lastly, the tablet copy
protocol used in rereplication tries to copy the entire superblock in
one RPC message; if the superblock is too large, it'll run up against the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).

These examples are just off the top of my head; there may be others
lurking. So this goes back to what I led with: beyond the recommended
limit we aren't quite sure how Kudu's performance and stability are=
affected.

All that said, you're welcome to try it out and report back with your f= indings.


On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang <huang_quanlong@126.com> wrote: >
> Hi all,
>
> In the document of "Known Issues and Limitations", it's = recommended that "maximum amount of stored data, post-replication and = post-compression, per tablet server is 8TB". How is the 8TB calculated= ?
>
> We have some machines each with 15 * 4TB spinning disk drives and 256G= B RAM, 48 cpu cores. Does it mean the other 52(=3D 15 * 4 - 8) TB space is = recommended to leave for other systems? We prefer to make the machine dedic= ated to Kudu. Can tablet server leverage the whole space efficiently?
>
> Thanks,
> Quanlong



--
=
Todd Lipcon
Software Engineer, Cloudera=



--
Todd Lipcon
Software Engineer, Cloudera
--0000000000003e994805727d2036--