From user-return-1677-archive-asf-public=cust-asf.ponee.io@kudu.apache.org  Thu Jun 27 06:58:53 2019
Return-Path: <user-return-1677-archive-asf-public=cust-asf.ponee.io@kudu.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 16D0D180607
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 27 Jun 2019 08:58:51 +0200 (CEST)
Received: (qmail 60657 invoked by uid 500); 27 Jun 2019 06:58:51 -0000
Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@kudu.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@kudu.apache.org>
List-Post: <mailto:user@kudu.apache.org>
List-Id: <user.kudu.apache.org>
Reply-To: user@kudu.apache.org
Delivered-To: mailing list user@kudu.apache.org
Received: (qmail 60645 invoked by uid 99); 27 Jun 2019 06:58:51 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2019 06:58:51 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7665CC5D53
	for <user@kudu.apache.org>; Thu, 27 Jun 2019 06:58:50 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 2.05
X-Spam-Level: **
X-Spam-Status: No, score=2.05 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25,
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001,
	SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001]
	autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id BQVSrmkhtiYb for <user@kudu.apache.org>;
	Thu, 27 Jun 2019 06:58:48 +0000 (UTC)
Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 70E935FB12
	for <user@kudu.apache.org>; Thu, 27 Jun 2019 06:58:47 +0000 (UTC)
Received: by mail-oi1-f182.google.com with SMTP id v186so795283oie.5
        for <user@kudu.apache.org>; Wed, 26 Jun 2019 23:58:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=mU3Opi9EgnZCHZZN7Jv9mGEkxvR9nhLgHS8FgiL5tVs=;
        b=O43Fb5rjKHipYOLNXkCEaxHV+pXKpB5PVBmSfdBdGGNp4O4V1y5JFBRIr5OBYUdxo0
         XqWGlYo8qGMzcoLFAIabPBwggE5G+FYeGNGTBh1Ut27QHsye6EQx3jg+JCVPVZseuMuB
         0Gt3gVLDfktlVmUp/uLFntWHM1qAvPm2MgRbpHc+ukNMd70hgsgSyGl80WjrCEibzgtc
         D8pFFiD+wlSEp5JG9TGGU5X3dHYjoI3T8LUBIT7orc92GSo4giFnTZW+oHB8xa8XfVQj
         0l+cJNV0E1fFnE4khpmj+/FWZpeiehxi4/z8UoAiyTuehU/o0BN3xUx3rAqAB/ctLFIj
         35CA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=mU3Opi9EgnZCHZZN7Jv9mGEkxvR9nhLgHS8FgiL5tVs=;
        b=kTAmOJgHWyqfHeauRQHWmwIB8JxWbNGI2Wkue60hr+0T+VDif+4Zm1SpQRB0U/OMfm
         x9SXHMy7GQkpYKmHGq9QDh1m/Jjflvsr/A+JSUOQc2nb0/EMWm/062hfkL14zGyEzY11
         UCTaJ0xnQmBUWAe8GxF76AyArRlwkXGKVNC8OLUdbdAxjq8XzIGLjFL5YtMuOQpPSimv
         VYZtp4QRMYPYxgK2ZIyJob2WTsmyOIb+4OhLw6T3VhJyfhDDqyb7Bq53O9M6ZlNW0/cF
         HqZjpf+Y7xAzdcYaGFmoZu6KPMXx8W43468TUNTzB2eeDckX5SCeYtd6N/AeI0CMSF58
         m9Bw==
X-Gm-Message-State: APjAAAXJYxBNvr9x+y361iwhhx48/5urI/eROHo8byVtXfGUvlZ4hEPo
	uBxc5PK4BGrDCzW24bYzambeSUpc3Eu0rtwZzirM9KXt
X-Google-Smtp-Source: APXvYqxUpm9oV1XScSVXaIs2kG2H8IWjj8vz9ZoZdaxe1R15bygd/1n/+ujsKdbPjcZ1rta9snADTk9jnAbm6hzSlvc=
X-Received: by 2002:aca:b104:: with SMTP id a4mr1337682oif.14.1561618719328;
 Wed, 26 Jun 2019 23:58:39 -0700 (PDT)
MIME-Version: 1.0
References: <CAOHFFGTf=khAc0BAGwW=GuhbjeYhAffw7fDozHd=sQ7q-2Wneg@mail.gmail.com>
 <CADY20s64tDsy9odeSCMcDBOsG8XgqbND72Xp2Z+vhfTZnz-6zw@mail.gmail.com>
 <CAOHFFGQ=v51xqrdBeN5ktGaj579PtD4XaWJ9HyKSqgWZzAyCEA@mail.gmail.com>
 <CADY20s6z=Z6ZTR5vGujLK=G9jQ=ftOBUDpJkhdR2+8GeHmMSKA@mail.gmail.com>
 <CAOHFFGRRwkNCD4PC_6qTVTKCx09hZiK00W2+-K9k7cZV-Rvx+g@mail.gmail.com> <CADY20s76dU35pxBLgQ4UepUCd3MtdKz6cNd+oc8R4MbsY1WE1g@mail.gmail.com>
In-Reply-To: <CADY20s76dU35pxBLgQ4UepUCd3MtdKz6cNd+oc8R4MbsY1WE1g@mail.gmail.com>
From: Pavel Martynov <mr.xkurt@gmail.com>
Date: Thu, 27 Jun 2019 09:58:27 +0300
Message-ID: <CAOHFFGQpgkBgwe_AP9sg=b80vbNnkHEAE5AJOxdkNApbo9zA8A@mail.gmail.com>
Subject: Re: WAL size estimation
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary="0000000000000a500f058c48b1cc"

--0000000000000a500f058c48b1cc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Todd,

This tablet disappeared from WAL path. I think it was time partition that
we already removed.

=D1=87=D1=82, 27 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 08:58, Todd Lipcon=
 <todd@cloudera.com>:

> Hey Pavel,
>
> I went back and looked at the source here. It appears that 24MB is the
> expected size for an index file -- each entry is 24 bytes and the index
> file should keep 1M entries.
>
> That said, for a "cold tablet" (in which you'd have only a small number o=
f
> actual WAL files) I would expect only a single index file. The example yo=
u
> gave where you have 12 index files but only one WAL segment seems quite
> fishy to me. Having 12 index files indicates you have 12M separate WAL
> entries, but given you have only 8MB of WAL, that indicates each entry is
> less than one byte large, which doesn't make much sense at all.
>
> If you go back and look at that same tablet now, did it eventually GC
> those log index files?
>
> -Todd
>
>
>
> On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov <mr.xkurt@gmail.com> wrote=
:
>
>> > Try adding the '-p' flag here? That should show preallocated extents.
>> Would be interesting to run it on some index file which is larger than 1=
MB,
>> for example.
>>
>> # du -h --apparent-size index.000000108
>> 23M     index.000000108
>>
>> # du -h index.000000108
>> 23M     index.000000108
>>
>> # xfs_bmap -v -p index.000000108
>> index.000000108:
>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOTA=
L
>> FLAGS
>>    0: [0..2719]:       1175815920..1175818639  2 (3704560..3707279)  272=
0
>> 00000
>>    1: [2720..5111]:    1175828904..1175831295  2 (3717544..3719935)  239=
2
>> 00000
>>    2: [5112..7767]:    1175835592..1175838247  2 (3724232..3726887)  265=
6
>> 00000
>>    3: [7768..10567]:   1175849896..1175852695  2 (3738536..3741335)  280=
0
>> 00000
>>    4: [10568..15751]:  1175877808..1175882991  2 (3766448..3771631)  518=
4
>> 00000
>>    5: [15752..18207]:  1175898864..1175901319  2 (3787504..3789959)  245=
6
>> 00000
>>    6: [18208..20759]:  1175909192..1175911743  2 (3797832..3800383)  255=
2
>> 00000
>>    7: [20760..23591]:  1175921616..1175924447  2 (3810256..3813087)  283=
2
>> 00000
>>    8: [23592..26207]:  1175974872..1175977487  2 (3863512..3866127)  261=
6
>> 00000
>>    9: [26208..28799]:  1175989496..1175992087  2 (3878136..3880727)  259=
2
>> 00000
>>   10: [28800..31199]:  1175998552..1176000951  2 (3887192..3889591)  240=
0
>> 00000
>>   11: [31200..33895]:  1176008336..1176011031  2 (3896976..3899671)  269=
6
>> 00000
>>   12: [33896..36591]:  1176031696..1176034391  2 (3920336..3923031)  269=
6
>> 00000
>>   13: [36592..39191]:  1176037440..1176040039  2 (3926080..3928679)  260=
0
>> 00000
>>   14: [39192..41839]:  1176072008..1176074655  2 (3960648..3963295)  264=
8
>> 00000
>>   15: [41840..44423]:  1176097752..1176100335  2 (3986392..3988975)  258=
4
>> 00000
>>   16: [44424..46879]:  1176132144..1176134599  2 (4020784..4023239)  245=
6
>> 00000
>>
>>
>>
>>
>>
>> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd Lip=
con <todd@cloudera.com>:
>>
>>>
>>>
>>> On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov <mr.xkurt@gmail.com>
>>> wrote:
>>>
>>>> Hi Todd, thanks for the answer!
>>>>
>>>> > Any chance you've done something like copy the files away and back
>>>> that might cause them to lose their sparseness?
>>>>
>>>> No, I don't think so. Recently we experienced some problems with
>>>> stability with Kudu, and ran rebalance a couple of times, if this rela=
ted.
>>>> But we never used fs commands like cp/mv against Kudu dirs.
>>>>
>>>> I ran du on all-WALs dir:
>>>> # du -sh /mnt/data01/kudu-tserver-wal/
>>>> 12G     /mnt/data01/kudu-tserver-wal/
>>>>
>>>> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
>>>> 25G     /mnt/data01/kudu-tserver-wal/
>>>>
>>>> And on WAL with a many indexes:
>>>> # du -sh --apparent-size
>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>> 306M
>>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>>
>>>> # du -sh
>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>> 296M
>>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>>
>>>>
>>>> > Also, any chance you're using XFS here?
>>>>
>>>> Yes, exactly XFS. We use CentOS 7.6.
>>>>
>>>> What is interesting, there are no many holes in index files in
>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WA=
L dir
>>>> that I mention before). Only single hole in single index file (of 13 f=
iles):
>>>> # xfs_bmap -v index.000000120
>>>>
>>>
>>> Try adding the '-p' flag here? That should show preallocated extents.
>>> Would be interesting to run it on some index file which is larger than =
1MB,
>>> for example.
>>>
>>>
>>>> index.000000120:
>>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET
>>>>  TOTAL
>>>>    0: [0..4231]:       1176541248..1176545479  2 (4429888..4434119)
>>>>  4232
>>>>    1: [4232..9815]:    1176546592..1176552175  2 (4435232..4440815)
>>>>  5584
>>>>    2: [9816..11583]:   1176552832..1176554599  2 (4441472..4443239)
>>>>  1768
>>>>    3: [11584..13319]:  1176558672..1176560407  2 (4447312..4449047)
>>>>  1736
>>>>    4: [13320..15239]:  1176565336..1176567255  2 (4453976..4455895)
>>>>  1920
>>>>    5: [15240..17183]:  1176570776..1176572719  2 (4459416..4461359)
>>>>  1944
>>>>    6: [17184..18999]:  1176575856..1176577671  2 (4464496..4466311)
>>>>  1816
>>>>    7: [19000..20927]:  1176593552..1176595479  2 (4482192..4484119)
>>>>  1928
>>>>    8: [20928..22703]:  1176599128..1176600903  2 (4487768..4489543)
>>>>  1776
>>>>    9: [22704..24575]:  1176602704..1176604575  2 (4491344..4493215)
>>>>  1872
>>>>   10: [24576..26495]:  1176611936..1176613855  2 (4500576..4502495)
>>>>  1920
>>>>   11: [26496..26655]:  1176615040..1176615199  2 (4503680..4503839)
>>>> 160
>>>>   12: [26656..46879]:  hole
>>>> 20224
>>>>
>>>> But in some other WAL I see like this:
>>>> # xfs_bmap -v
>>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/inde=
x.000000000
>>>>
>>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/ind=
ex.000000000:
>>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET        TOTA=
L
>>>>    0: [0..7]:          1758753776..1758753783  3 (586736..586743)     =
8
>>>>    1: [8..46879]:      hole                                       4687=
2
>>>>
>>>> Looks like there actually used only 8 blocks and all other blocks are
>>>> the hole.
>>>>
>>>>
>>>> So looks like I can use formulas with confidence.
>>>> Normal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,0=
00
>>>> MB =3D ~1.3 TB (+ some minor index overhead)
>>>> Worse case: 8 MB/segment * 1 segment * 2000 tablets =3D 1,280,000 MB =
=3D
>>>> ~16 GB (+ some minor index overhead)
>>>>
>>>> Right?
>>>>
>>>>
>>>> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd L=
ipcon <todd@cloudera.com>:
>>>>
>>>>> Hi Pavel,
>>>>>
>>>>> That's not quite expected. For example, on one of our test clusters
>>>>> here, we have about 65GB of WALs and about 1GB of index files. If I r=
ecall
>>>>> correctly, the index files store 8 bytes per WAL entry, so typically =
a
>>>>> couple orders of magnitude smaller than the WALs themselves.
>>>>>
>>>>> One thing is that the index files are sparse. Any chance you've done
>>>>> something like copy the files away and back that might cause them to =
lose
>>>>> their sparseness? If I use du --apparent-size on mine, it's total of =
about
>>>>> 180GB vs the 1GB of actual size.
>>>>>
>>>>> Also, any chance you're using XFS here? XFS sometimes likes to
>>>>> preallocate large amounts of data into files while they're open, and =
only
>>>>> frees it up if disk space is contended. I think you can use 'xfs_bmap=
' on
>>>>> an index file to see the allocation status, which might be interestin=
g.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov <mr.xkurt@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi guys!
>>>>>>
>>>>>> We want to buy SSDs for TServers WALs for our cluster. I'm working o=
n
>>>>>> capacity estimation for this SSDs using "Getting Started with Kudu" =
book,
>>>>>> Chapter 4, Write-Ahead Log (
>>>>>> https://www.oreilly.com/library/view/getting-started-with/9781491980=
248/ch04.html
>>>>>> <https://www.oreilly.com/library/view/getting-started-with/978149198=
0248/ch04.html#idm139738927926240>
>>>>>> ).
>>>>>>
>>>>>> NB: we use default Kudu WAL configuration settings.
>>>>>>
>>>>>> There is a formula for worse-case:
>>>>>> 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 MB =3D ~=
1.3 TB
>>>>>>
>>>>>> So, this formula takes into account only segment files. But in our
>>>>>> cluster, I see that every segment file has >=3D 1 corresponding inde=
x files.
>>>>>> And every index file actually larger than segment file.
>>>>>>
>>>>>> Numbers from one of our nodes.
>>>>>> WALs count:
>>>>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l
>>>>>> 711
>>>>>>
>>>>>> Overall WAL size:
>>>>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/
>>>>>> 13G     /mnt/data01/kudu-tserver-wal/
>>>>>>
>>>>>> Size of all segment files:
>>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du
>>>>>> -ch {} + | grep total$
>>>>>> 6.1G    total
>>>>>>
>>>>>> Size of all index files:
>>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du
>>>>>> -ch {} + | grep total$
>>>>>> 6.5G    total
>>>>>>
>>>>>> So I have questions.
>>>>>>
>>>>>> 1. How can I estimate the size of index files?
>>>>>> Looks like in our cluster size of index files approximately equal to
>>>>>> size segment files.
>>>>>>
>>>>>> 2. There is some WALs with more than one index files. For example:
>>>>>> $ ls -lh
>>>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
>>>>>> total 296M
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:31 index.000000108
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:41 index.000000109
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:52 index.000000110
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:10 index.000000111
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:22 index.000000112
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:35 index.000000113
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:48 index.000000114
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:01 index.000000115
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:14 index.000000116
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:27 index.000000117
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:40 index.000000118
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:52 index.000000119
>>>>>> -rw-r--r-- 1 root root  23M Jun 19 01:13 index.000000120
>>>>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799
>>>>>>
>>>>>> Is this a normal situation?
>>>>>>
>>>>>> 3. Not a question. Please, consider adding documentation about the
>>>>>> estimation of WAL storage. Also, I can't found any mentions about in=
dex
>>>>>> files, except here
>>>>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --
>>>>>> with best regards, Pavel Martynov
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>> --
>>>> with best regards, Pavel Martynov
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>> --
>> with best regards, Pavel Martynov
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


--=20
with best regards, Pavel Martynov

--0000000000000a500f058c48b1cc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Todd,<div><br></div><div>This tablet disappeared from W=
AL path. I think it was time partition that we already removed.</div></div>=
<br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=D1=87=
=D1=82, 27 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 08:58, Todd Lipcon &lt;<=
a href=3D"mailto:todd@cloudera.com">todd@cloudera.com</a>&gt;:<br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hey Pavel,<=
div><br></div><div>I went back and looked at the source here. It appears th=
at 24MB is the expected size for an index file -- each entry is 24 bytes an=
d the index file should keep 1M entries.</div><div><br></div><div>That said=
, for a &quot;cold tablet&quot; (in which you&#39;d have only a small numbe=
r of actual WAL files) I would expect only a single index file. The example=
 you gave where you have 12 index files but only one WAL segment seems quit=
e fishy to me. Having 12 index files indicates you have 12M separate WAL en=
tries, but given you have only 8MB of WAL, that indicates each entry is les=
s than one byte large, which doesn&#39;t make much sense at all.</div><div>=
<br></div><div>If you go back and look at that same tablet now, did it even=
tually GC those log index files?</div><div><br></div><div>-Todd</div><div><=
br></div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"lt=
r" class=3D"gmail_attr">On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov &lt;=
<a href=3D"mailto:mr.xkurt@gmail.com" target=3D"_blank">mr.xkurt@gmail.com<=
/a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><=
div dir=3D"ltr"><div>&gt; Try adding the &#39;-p&#39; flag here? That shoul=
d show preallocated extents. Would be interesting to run it on some index f=
ile which is larger than 1MB, for example.</div><div><br></div># du -h --ap=
parent-size index.000000108<br>23M =C2=A0 =C2=A0 index.000000108<br><br><di=
v># du -h index.000000108<br>23M =C2=A0 =C2=A0 index.000000108<br><br></div=
><div># xfs_bmap -v -p index.000000108<br>index.000000108:<br>=C2=A0EXT: FI=
LE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL FLAGS<br>=C2=
=A0 =C2=A00: [0..2719]: =C2=A0 =C2=A0 =C2=A0 1175815920..1175818639 =C2=A02=
 (3704560..3707279) =C2=A02720 00000<br>=C2=A0 =C2=A01: [2720..5111]: =C2=
=A0 =C2=A01175828904..1175831295 =C2=A02 (3717544..3719935) =C2=A02392 0000=
0<br>=C2=A0 =C2=A02: [5112..7767]: =C2=A0 =C2=A01175835592..1175838247 =C2=
=A02 (3724232..3726887) =C2=A02656 00000<br>=C2=A0 =C2=A03: [7768..10567]: =
=C2=A0 1175849896..1175852695 =C2=A02 (3738536..3741335) =C2=A02800 00000<b=
r>=C2=A0 =C2=A04: [10568..15751]: =C2=A01175877808..1175882991 =C2=A02 (376=
6448..3771631) =C2=A05184 00000<br>=C2=A0 =C2=A05: [15752..18207]: =C2=A011=
75898864..1175901319 =C2=A02 (3787504..3789959) =C2=A02456 00000<br>=C2=A0 =
=C2=A06: [18208..20759]: =C2=A01175909192..1175911743 =C2=A02 (3797832..380=
0383) =C2=A02552 00000<br>=C2=A0 =C2=A07: [20760..23591]: =C2=A01175921616.=
.1175924447 =C2=A02 (3810256..3813087) =C2=A02832 00000<br>=C2=A0 =C2=A08: =
[23592..26207]: =C2=A01175974872..1175977487 =C2=A02 (3863512..3866127) =C2=
=A02616 00000<br>=C2=A0 =C2=A09: [26208..28799]: =C2=A01175989496..11759920=
87 =C2=A02 (3878136..3880727) =C2=A02592 00000<br>=C2=A0 10: [28800..31199]=
: =C2=A01175998552..1176000951 =C2=A02 (3887192..3889591) =C2=A02400 00000<=
br>=C2=A0 11: [31200..33895]: =C2=A01176008336..1176011031 =C2=A02 (3896976=
..3899671) =C2=A02696 00000<br>=C2=A0 12: [33896..36591]: =C2=A01176031696.=
.1176034391 =C2=A02 (3920336..3923031) =C2=A02696 00000<br>=C2=A0 13: [3659=
2..39191]: =C2=A01176037440..1176040039 =C2=A02 (3926080..3928679) =C2=A026=
00 00000<br>=C2=A0 14: [39192..41839]: =C2=A01176072008..1176074655 =C2=A02=
 (3960648..3963295) =C2=A02648 00000<br>=C2=A0 15: [41840..44423]: =C2=A011=
76097752..1176100335 =C2=A02 (3986392..3988975) =C2=A02584 00000<br>=C2=A0 =
16: [44424..46879]: =C2=A01176132144..1176134599 =C2=A02 (4020784..4023239)=
 =C2=A02456 00000</div><div><br></div><div><br></div><div><br><br></div></d=
iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=D1=
=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd Lipcon &l=
t;<a href=3D"mailto:todd@cloudera.com" target=3D"_blank">todd@cloudera.com<=
/a>&gt;:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div di=
r=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote"><div di=
r=3D"ltr" class=3D"gmail_attr">On Wed, Jun 19, 2019 at 12:49 AM Pavel Marty=
nov &lt;<a href=3D"mailto:mr.xkurt@gmail.com" target=3D"_blank">mr.xkurt@gm=
ail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex"><div dir=3D"ltr">Hi Todd, thanks for the answer!<div><br></div><div>=
&gt; Any chance you&#39;ve done something like copy the files away and back=
 that might cause them to lose their sparseness?=C2=A0=C2=A0<br></div><div>=
<br></div><div>No, I don&#39;t think so. Recently we experienced some probl=
ems with stability with Kudu, and ran rebalance a couple of times, if this =
related. But we never used fs commands like cp/mv against Kudu dirs.<br></d=
iv><div><br></div><div>I ran du on all-WALs dir:</div><div># du -sh /mnt/da=
ta01/kudu-tserver-wal/<br>12G =C2=A0 =C2=A0 /mnt/data01/kudu-tserver-wal/<b=
r><br># du -sh --apparent-size /mnt/data01/kudu-tserver-wal/<br>25G =C2=A0 =
=C2=A0 /mnt/data01/kudu-tserver-wal/<br><br></div><div>And on WAL with a ma=
ny indexes:</div><div># du -sh --apparent-size /mnt/data01/kudu-tserver-wal=
/wals/779a382ea4e6464aa80ea398070a391f<br>306M =C2=A0 =C2=A0/mnt/data01/kud=
u-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f<br><br></div><div># du =
-sh /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f<br>2=
96M =C2=A0 =C2=A0/mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398=
070a391f<br></div><div><br></div><div><br></div><div>&gt; Also, any chance =
you&#39;re using XFS here?<br></div><div><br></div><div>Yes, exactly XFS. W=
e use CentOS 7.6.</div><div><br></div><div>What is interesting, there are n=
o many holes in index files in /mnt/data01/kudu-tserver-wal/wals/779a382ea4=
e6464aa80ea398070a391f (WAL dir that I mention before). Only single hole in=
 single index file (of 13 files):</div><div># xfs_bmap -v index.000000120<b=
r></div></div></blockquote><div><br></div><div>Try adding the &#39;-p&#39; =
flag here? That should show preallocated extents. Would be interesting to r=
un it on some index file which is larger than 1MB, for example.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"l=
tr"><div>index.000000120:<br>=C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLO=
CK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0TOTAL<br>=C2=A0 =C2=A00: [0..4231]: =C2=A0 =C2=A0 =
=C2=A0 1176541248..1176545479 =C2=A02 (4429888..4434119) =C2=A04232<br>=C2=
=A0 =C2=A01: [4232..9815]: =C2=A0 =C2=A01176546592..1176552175 =C2=A02 (443=
5232..4440815) =C2=A05584<br>=C2=A0 =C2=A02: [9816..11583]: =C2=A0 11765528=
32..1176554599 =C2=A02 (4441472..4443239) =C2=A01768<br>=C2=A0 =C2=A03: [11=
584..13319]: =C2=A01176558672..1176560407 =C2=A02 (4447312..4449047) =C2=A0=
1736<br>=C2=A0 =C2=A04: [13320..15239]: =C2=A01176565336..1176567255 =C2=A0=
2 (4453976..4455895) =C2=A01920<br>=C2=A0 =C2=A05: [15240..17183]: =C2=A011=
76570776..1176572719 =C2=A02 (4459416..4461359) =C2=A01944<br>=C2=A0 =C2=A0=
6: [17184..18999]: =C2=A01176575856..1176577671 =C2=A02 (4464496..4466311) =
=C2=A01816<br>=C2=A0 =C2=A07: [19000..20927]: =C2=A01176593552..1176595479 =
=C2=A02 (4482192..4484119) =C2=A01928<br>=C2=A0 =C2=A08: [20928..22703]: =
=C2=A01176599128..1176600903 =C2=A02 (4487768..4489543) =C2=A01776<br>=C2=
=A0 =C2=A09: [22704..24575]: =C2=A01176602704..1176604575 =C2=A02 (4491344.=
.4493215) =C2=A01872<br>=C2=A0 10: [24576..26495]: =C2=A01176611936..117661=
3855 =C2=A02 (4500576..4502495) =C2=A01920<br>=C2=A0 11: [26496..26655]: =
=C2=A01176615040..1176615199 =C2=A02 (4503680..4503839) =C2=A0 160<br>=C2=
=A0 12: [26656..46879]: =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 20224</div><div><br></div><div>But in some othe=
r WAL I see like this:</div><div># xfs_bmap -v /mnt/data01/kudu-tserver-wal=
/wals/508ecdfa8904bdb97a02078a91822af/index.000000000<br>/mnt/data01/kudu-t=
server-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:<br>=C2=A0=
EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL<br>=C2=A0 =
=C2=A00: [0..7]: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01758753776..1758753783 =
=C2=A03 (586736..586743) =C2=A0 =C2=A0 8<br>=C2=A0 =C2=A01: [8..46879]: =C2=
=A0 =C2=A0 =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 46872<br></div><div><br></div><div>Looks like there actually used on=
ly 8 blocks and all other blocks are the hole.</div><div><br></div><div><br=
></div><div>So looks like I can use formulas with confidence.</div><div>Nor=
mal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 MB =
=3D ~1.3 TB (+ some minor index overhead)</div><div>Worse case: 8 MB/segmen=
t * 1 segment * 2000 tablets =3D 1,280,000 MB =3D ~16 GB (+ some minor inde=
x overhead)<br></div><div><br></div><div>Right?</div><div><br></div></div><=
br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=D1=81=
=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd Lipcon &lt;<=
a href=3D"mailto:todd@cloudera.com" target=3D"_blank">todd@cloudera.com</a>=
&gt;:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"ltr">Hi Pavel,<div><br></div><div>That&#39;s not quite expected. For ex=
ample, on one of our test clusters here, we have about 65GB of WALs and abo=
ut 1GB of index files. If I recall correctly, the index files store 8 bytes=
 per WAL entry, so typically a couple orders of magnitude smaller than the =
WALs themselves.</div><div><br></div><div>One thing is that the index files=
 are sparse. Any chance you&#39;ve done something like copy the files away =
and back that might cause them to lose their sparseness? If I use du --appa=
rent-size on mine, it&#39;s total of about 180GB vs the 1GB of actual size.=
</div><div><br></div><div>Also, any chance you&#39;re using XFS here? XFS s=
ometimes likes to preallocate large amounts of data into files while they&#=
39;re open, and only frees it up if disk space is contended. I think you ca=
n use &#39;xfs_bmap&#39; on an index file to see the allocation status, whi=
ch might be interesting.</div><div><br></div><div>-Todd</div></div><br><div=
 class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jun 18=
, 2019 at 11:12 PM Pavel Martynov &lt;<a href=3D"mailto:mr.xkurt@gmail.com"=
 target=3D"_blank">mr.xkurt@gmail.com</a>&gt; wrote:<br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px soli=
d rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hi guys!<div><br></di=
v><div>We want to buy SSDs for TServers WALs for our cluster. I&#39;m worki=
ng on capacity estimation for this SSDs using &quot;Getting Started with Ku=
du&quot; book, Chapter 4, Write-Ahead Log (<a href=3D"https://www.oreilly.c=
om/library/view/getting-started-with/9781491980248/ch04.html#idm13973892792=
6240" target=3D"_blank">https://www.oreilly.com/library/view/getting-starte=
d-with/9781491980248/ch04.html</a>).</div><div><br></div><div>NB: we use de=
fault Kudu WAL configuration settings.</div><div><br></div><div>There is a =
formula for worse-case:</div><div>8 MB/segment * 80 max segments * 2000 tab=
lets =3D 1,280,000 MB =3D ~1.3 TB<br></div><div><br></div><div>So, this for=
mula takes into account only segment files. But in our cluster, I see that =
every segment file has &gt;=3D 1 corresponding index files. And every index=
 file actually larger than segment file.</div><div><br></div><div>Numbers f=
rom one of our nodes.</div><div>WALs count:</div><div>$ ls /mnt/data01/kudu=
-tserver-wal/wals/ | wc -l<br>711<br></div><div><br></div><div>Overall WAL =
size:<br>$ du -d 0 -h /mnt/data01/kudu-tserver-wal/<br>13G =C2=A0 =C2=A0 /m=
nt/data01/kudu-tserver-wal/</div><div><br></div><div>Size of all segment fi=
les:</div><div>$ find /mnt/data01/kudu-tserver-wal/ -type f -name &#39;wal-=
*&#39; -exec du -ch {} + | grep total$<br>6.1G =C2=A0 =C2=A0total<br></div>=
<div><br></div><div>Size of all index files:</div><div>$ find /mnt/data01/k=
udu-tserver-wal/ -type f -name &#39;index*&#39; -exec du -ch {} + | grep to=
tal$<br>6.5G =C2=A0 =C2=A0total</div><div><br></div><div>So I have question=
s.</div><div><br></div><div>1. How can I estimate the size of index files?<=
/div><div>Looks like in our cluster size of index files approximately equal=
 to size segment files.</div><div><br></div><div>2. There is some WALs with=
 more than one index files. For example:</div><div>$ ls -lh /mnt/data01/kud=
u-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/<br>total 296M<br>-rw-r=
--r-- 1 root root =C2=A023M Jun 18 21:31 index.000000108<br>-rw-r--r-- 1 ro=
ot root =C2=A023M Jun 18 21:41 index.000000109<br>-rw-r--r-- 1 root root =
=C2=A023M Jun 18 21:52 index.000000110<br>-rw-r--r-- 1 root root =C2=A023M =
Jun 18 22:10 index.000000111<br>-rw-r--r-- 1 root root =C2=A023M Jun 18 22:=
22 index.000000112<br>-rw-r--r-- 1 root root =C2=A023M Jun 18 22:35 index.0=
00000113<br>-rw-r--r-- 1 root root =C2=A023M Jun 18 22:48 index.000000114<b=
r>-rw-r--r-- 1 root root =C2=A023M Jun 18 23:01 index.000000115<br>-rw-r--r=
-- 1 root root =C2=A023M Jun 18 23:14 index.000000116<br>-rw-r--r-- 1 root =
root =C2=A023M Jun 18 23:27 index.000000117<br>-rw-r--r-- 1 root root =C2=
=A023M Jun 18 23:40 index.000000118<br>-rw-r--r-- 1 root root =C2=A023M Jun=
 18 23:52 index.000000119<br>-rw-r--r-- 1 root root =C2=A023M Jun 19 01:13 =
index.000000120<br>-rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799<b=
r></div><div><br></div><div>Is this a normal situation?</div><div><br></div=
><div>3. Not a question. Please, consider adding documentation about the es=
timation of WAL storage. Also, I can&#39;t found any mentions about index f=
iles, except here=C2=A0<a href=3D"https://kudu.apache.org/docs/scaling_guid=
e.html#file_descriptors" target=3D"_blank">https://kudu.apache.org/docs/sca=
ling_guide.html#file_descriptors</a>.</div><div><br></div><div>Thanks!</div=
><div><div><br></div>-- <br><div class=3D"gmail-m_9072979216739265041gmail-=
m_3631967974831540611gmail-m_-1981890599346295557gmail-m_-13021817242805695=
39gmail-m_-7277175221344590141gmail-m_-6051491009798031631gmail_signature" =
dir=3D"ltr">with best regards, Pavel Martynov<br></div></div></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_9072979216739265041gmail-m_3631967974831540611gmail-m_-19=
81890599346295557gmail-m_-1302181724280569539gmail-m_-7277175221344590141gm=
ail_signature">Todd Lipcon<br>Software Engineer, Cloudera</div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_9072979216739265041gmail-m_3631967974831540611gmail-m_-19=
81890599346295557gmail-m_-1302181724280569539gmail_signature">with best reg=
ards, Pavel Martynov<br></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_9072979216739265041gmail-m_3631967974831540611gmail-m_-19=
81890599346295557gmail_signature">Todd Lipcon<br>Software Engineer, Clouder=
a</div></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_9072979216739265041gmail-m_3631967974831540611gmail_signa=
ture">with best regards, Pavel Martynov<br></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_9072979216739265041gmail_signature">Todd Lipcon<br>Softwa=
re Engineer, Cloudera</div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail_signature">with best regards, Pavel Martynov<br></div>

--0000000000000a500f058c48b1cc--