From user-return-1453-archive-asf-public=cust-asf.ponee.io@kudu.apache.org  Fri Aug  3 02:37:07 2018
Return-Path: <user-return-1453-archive-asf-public=cust-asf.ponee.io@kudu.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 293B7180629
	for <archive-asf-public@cust-asf.ponee.io>; Fri,  3 Aug 2018 02:37:05 +0200 (CEST)
Received: (qmail 59378 invoked by uid 500); 3 Aug 2018 00:37:05 -0000
Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@kudu.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@kudu.apache.org>
List-Post: <mailto:user@kudu.apache.org>
List-Id: <user.kudu.apache.org>
Reply-To: user@kudu.apache.org
Delivered-To: mailing list user@kudu.apache.org
Received: (qmail 59368 invoked by uid 99); 3 Aug 2018 00:37:04 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2018 00:37:04 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 36424CE636
	for <user@kudu.apache.org>; Fri,  3 Aug 2018 00:37:04 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.899
X-Spam-Level: *
X-Spam-Status: No, score=1.899 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001]
	autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=cloudera.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id 6F9fLakn8ErP for <user@kudu.apache.org>;
	Fri,  3 Aug 2018 00:37:02 +0000 (UTC)
Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 67EF45F3EE
	for <user@kudu.apache.org>; Fri,  3 Aug 2018 00:37:02 +0000 (UTC)
Received: by mail-lf1-f53.google.com with SMTP id j143-v6so2825266lfj.12
        for <user@kudu.apache.org>; Thu, 02 Aug 2018 17:37:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cloudera.com; s=google;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=vmfMQrjwuJfwlkK3nOgHMkMp4y50s7KkGFSwAVtSAi0=;
        b=fiP4WpUAoTQSuqgVIf/4wfQ7FSRXb6aiAcn/71xPaf9mXmZNBFBqCGw3+7D+wGx3KK
         lDM/x+qcZ8H4obHaqXeMoUkuDmT7EfbJgXyoLRFp0o1EoH27OrIlSiEkti4IJiRmNlEr
         yMsGGfm8jtTFbTY+fjNvR568DQmjW1sNMK28kjqvJmb5+qu4UnQZsIOrEtHma9gTtOTq
         ftqjxaO5UPQ/O2v3TyJOvd1aTJvzMiZ2vt+lT+guESWkubvQcXttD4Wo5uABQXmEDl8d
         FwdoY4fy0gAhYEh98Kr+KHEZtVX8ukRi7jQ+8sUkyspeMXbu+UiW/SpYW59kVE1l4PEn
         S/TA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=vmfMQrjwuJfwlkK3nOgHMkMp4y50s7KkGFSwAVtSAi0=;
        b=eYx6XwBa7CJXZXq9ibGsua1On29gWSlU65C3uZ8rTSW2Vyveis8icQqv9xIWiy5pd5
         YEzW4IyCrwOsNqlhCSzZmkdrHg9ROi5eZSztiXe3phD+H5dTeUyua+C27xLUzHwqZ27X
         DNz8Ej47KoD/mLPQNyw6Vk0afmP+kFOEQDkVY/S5YZIMXG4i0DgaRdMdCx+eFNjIA7rv
         a7iHSxfrL6PnXrbPiNQgivCvdSJwUM3yjZlLtITscNwrP6bdKThQiloqKJIDzCHbb1pP
         ZzQEDaxBR/8jUSgVnMod2Ku1SJTyI9lyRlmwkAuWa37u/5McwrZSlFReTUXZ6J7VGu36
         WgwA==
X-Gm-Message-State: AOUpUlG1t+hC0fudF8+oCN4PL98YkGvBKBMzd1Sle63lfmiRVhMaHOwz
	9ysmd7qndua/X8OJt10UTOaiJbsxH/b0zZcXLEhWzpJS
X-Google-Smtp-Source: AAOMgpf5nfPpHhLX84slcMDK7hGTvNhkpqfVVqt/j0zAY4XiUxOuBbcNScPPu9vtpn8LFnzBzqbIQeUvY84gLxbb4FE=
X-Received: by 2002:ac2:418b:: with SMTP id z11-v6mr3244555lfh.3.1533256620993;
 Thu, 02 Aug 2018 17:37:00 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a2e:590c:0:0:0:0:0 with HTTP; Thu, 2 Aug 2018 17:36:39 -0700 (PDT)
In-Reply-To: <461e5d4e.627.164fd10d8eb.Coremail.huang_quanlong@126.com>
References: <5f39c4c2.9f65.164fb054418.Coremail.huang_quanlong@126.com>
 <CAMcOB6OCDzxeC5FzPED-E1SQ3euTkp07K2bAf7+XRHUVgZvROQ@mail.gmail.com>
 <CADY20s5W-SzGzw9o=1mkH1OC1HqQcTXi7kdD2Ye07ZWcupwOKA@mail.gmail.com> <461e5d4e.627.164fd10d8eb.Coremail.huang_quanlong@126.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Thu, 2 Aug 2018 17:36:39 -0700
Message-ID: <CADY20s7=nYJB9cPiyBpXV5CGc-7Are8XVnUfhTwdQV=FSnJ3gw@mail.gmail.com>
Subject: Re: Re: Recommended maximum amount of stored data per tablet server
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary="0000000000003e994805727d2036"

--0000000000003e994805727d2036
Content-Type: text/plain; charset="UTF-8"

On Thu, Aug 2, 2018 at 4:54 PM, Quanlong Huang <huang_quanlong@126.com>
wrote:

> Thank Adar and Todd! We'd like to contribute when we could.
>
> Are there any concerns if we share the machines with HDFS DataNodes and
> Yarn NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they
> don't share the same disks, e.g. 4 disks for kudu and the other 11 disks
> for DataNode and NodeManager, and leave enough CPU & mem for kudu. Is that
> right?
>

That should be fine. Typically we actualyl recommend sharing all the disks
for all of the services. There is a trade-off between static partitioning
(exclusive access to a smaller number of disks) vs dynamic sharing
(potential contention but more available resources). Unless your workload
is very latency sensitive I usually think it's better to have the bigger
pool of resources available even if it needs to share with other systems.

One recommendation, though is to consider using a dedicated disk for the
Kudu WAL and metadata, which can help performance, since the WAL can be
sensitive to other heavy workloads monopolizing bandwidth on the same
spindle.

-Todd

>
> At 2018-08-03 02:26:37, "Todd Lipcon" <todd@cloudera.com> wrote:
>
> +1 to what Adar said.
>
> One tension we have currently for scaling is that we don't want to scale
> individual tablets too large, because of problems like the superblock that
> Adar mentioned. However, the solution of just having more tablets is also
> not a great one, since many of our startup time problems are primarily
> affected by the number of tablets more than their size (see KUDU-38 as the
> prime, ancient, example). Additionally, having lots of tablets increases
> raft heartbeat traffic and may need to dial back those heartbeat intervals
> to keep things stable.
>
> All of these things can be addressed in time and with some work. If you
> are interested in working on these areas to improve density that would be a
> great contribution.
>
> -Todd
>
>
>
> On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo <adar@cloudera.com>
> wrote:
>
>> The 8TB limit isn't a hard one, it's just a reflection of the scale
>> that Kudu developers commonly test. Beyond 8TB we can't vouch for
>> Kudu's stability and performance. For example, we know that as the
>> amount of on-disk data grows, node restart times get longer and longer
>> (see KUDU-2014 for some ideas on how to improve that). Furthermore, as
>> tablets accrue more data blocks, their superblocks become larger,
>> raising the minimum amount of I/O for any operation that rewrites a
>> superblock (such as a flush or compaction). Lastly, the tablet copy
>> protocol used in rereplication tries to copy the entire superblock in
>> one RPC message; if the superblock is too large, it'll run up against
>> the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).
>>
>> These examples are just off the top of my head; there may be others
>> lurking. So this goes back to what I led with: beyond the recommended
>> limit we aren't quite sure how Kudu's performance and stability are
>> affected.
>>
>> All that said, you're welcome to try it out and report back with your
>> findings.
>>
>>
>> On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang <huang_quanlong@126.com>
>> wrote:
>> >
>> > Hi all,
>> >
>> > In the document of "Known Issues and Limitations", it's recommended
>> that "maximum amount of stored data, post-replication and post-compression,
>> per tablet server is 8TB". How is the 8TB calculated?
>> >
>> > We have some machines each with 15 * 4TB spinning disk drives and 256GB
>> RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is
>> recommended to leave for other systems? We prefer to make the machine
>> dedicated to Kudu. Can tablet server leverage the whole space efficiently?
>> >
>> > Thanks,
>> > Quanlong
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

--0000000000003e994805727d2036
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
hu, Aug 2, 2018 at 4:54 PM, Quanlong Huang <span dir=3D"ltr">&lt;<a href=3D=
"mailto:huang_quanlong@126.com" target=3D"_blank">huang_quanlong@126.com</a=
>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"line-he=
ight:1.7;color:#000000;font-size:14px;font-family:Arial"><div>Thank Adar an=
d Todd! We&#39;d like to contribute when we could.</div><div><br></div><div=
>Are there any concerns if we share the machines with HDFS DataNodes and Ya=
rn NodeManagers? The network bandwidth is 10Gbps. I think it&#39;s ok if th=
ey don&#39;t share the same disks, e.g. 4 disks for kudu and the other 11 d=
isks for DataNode and NodeManager, and leave enough CPU &amp; mem for kudu.=
 Is that right?</div></div></blockquote><div><br></div><div>That should be =
fine. Typically we actualyl recommend sharing all the disks for all of the =
services. There is a trade-off between static partitioning (exclusive acces=
s to a smaller number of disks) vs dynamic sharing (potential contention bu=
t more available resources). Unless your workload is very latency sensitive=
 I usually think it&#39;s better to have the bigger pool of resources avail=
able even if it needs to share with other systems.</div><div><br></div><div=
>One recommendation, though is to consider using a dedicated disk for the K=
udu WAL and metadata, which can help performance, since the WAL can be sens=
itive to other heavy workloads monopolizing bandwidth on the same spindle.<=
/div><div><br></div><div>-Todd</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div sty=
le=3D"line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div>=
<div class=3D"m_5321901664843971459h5"><br>At 2018-08-03 02:26:37, &quot;To=
dd Lipcon&quot; &lt;<a href=3D"mailto:todd@cloudera.com" target=3D"_blank">=
todd@cloudera.com</a>&gt; wrote:<br> <blockquote id=3D"m_532190166484397145=
9m_8392080611850295588isReplyContent" style=3D"PADDING-LEFT:1ex;MARGIN:0px =
0px 0px 0.8ex;BORDER-LEFT:#ccc 1px solid"><div dir=3D"ltr">+1 to what Adar =
said.<div><br></div><div>One tension we have currently for scaling is that =
we don&#39;t want to scale individual tablets too large, because of problem=
s like the superblock that Adar mentioned. However, the solution of just ha=
ving more tablets is also not a great one, since many of our startup time p=
roblems are primarily affected by the number of tablets more than their siz=
e (see KUDU-38 as the prime, ancient, example). Additionally, having lots o=
f tablets increases raft heartbeat traffic and may need to dial back those =
heartbeat intervals to keep things stable.</div><div><br></div><div>All of =
these things can be addressed in time and with some work. If you are intere=
sted in working on these areas to improve density that would be a great con=
tribution.</div><div><br></div><div>-Todd</div><div><br></div><div><br></di=
v></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Thu, A=
ug 2, 2018 at 11:17 AM, Adar Lieber-Dembo <span dir=3D"ltr">&lt;<a href=3D"=
mailto:adar@cloudera.com" target=3D"_blank">adar@cloudera.com</a>&gt;</span=
> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">The 8TB limit isn&#39;t a hard o=
ne, it&#39;s just a reflection of the scale<br>
that Kudu developers commonly test. Beyond 8TB we can&#39;t vouch for<br>
Kudu&#39;s stability and performance. For example, we know that as the<br>
amount of on-disk data grows, node restart times get longer and longer<br>
(see KUDU-2014 for some ideas on how to improve that). Furthermore, as<br>
tablets accrue more data blocks, their superblocks become larger,<br>
raising the minimum amount of I/O for any operation that rewrites a<br>
superblock (such as a flush or compaction). Lastly, the tablet copy<br>
protocol used in rereplication tries to copy the entire superblock in<br>
one RPC message; if the superblock is too large, it&#39;ll run up against<b=
r>
the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).<br>
<br>
These examples are just off the top of my head; there may be others<br>
lurking. So this goes back to what I led with: beyond the recommended<br>
limit we aren&#39;t quite sure how Kudu&#39;s performance and stability are=
<br>
affected.<br>
<br>
All that said, you&#39;re welcome to try it out and report back with your f=
indings.<br>
<div class=3D"m_5321901664843971459m_8392080611850295588HOEnZb"><div class=
=3D"m_5321901664843971459m_8392080611850295588h5"><br>
<br>
On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang &lt;<a href=3D"mailto:huang_q=
uanlong@126.com" target=3D"_blank">huang_quanlong@126.com</a>&gt; wrote:<br=
>
&gt;<br>
&gt; Hi all,<br>
&gt;<br>
&gt; In the document of &quot;Known Issues and Limitations&quot;, it&#39;s =
recommended that &quot;maximum amount of stored data, post-replication and =
post-compression, per tablet server is 8TB&quot;. How is the 8TB calculated=
?<br>
&gt;<br>
&gt; We have some machines each with 15 * 4TB spinning disk drives and 256G=
B RAM, 48 cpu cores. Does it mean the other 52(=3D 15 * 4 - 8) TB space is =
recommended to leave for other systems? We prefer to make the machine dedic=
ated to Kudu. Can tablet server leverage the whole space efficiently?<br>
&gt;<br>
&gt; Thanks,<br>
&gt; Quanlong<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"m_5321901664843971459m_8392080611850295588gmail_signature" da=
ta-smartmail=3D"gmail_signature">Todd Lipcon<br>Software Engineer, Cloudera=
</div>
</div>
</blockquote></div></div></div></blockquote></div><br><br clear=3D"all"><di=
v><br></div>-- <br><div class=3D"m_5321901664843971459gmail_signature" data=
-smartmail=3D"gmail_signature">Todd Lipcon<br>Software Engineer, Cloudera</=
div>
</div></div>

--0000000000003e994805727d2036--