From user-return-1676-archive-asf-public=cust-asf.ponee.io@kudu.apache.org  Thu Jun 27 05:58:46 2019
Return-Path: <user-return-1676-archive-asf-public=cust-asf.ponee.io@kudu.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 4FAA3180607
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 27 Jun 2019 07:58:46 +0200 (CEST)
Received: (qmail 67837 invoked by uid 500); 27 Jun 2019 05:58:45 -0000
Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@kudu.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@kudu.apache.org>
List-Post: <mailto:user@kudu.apache.org>
List-Id: <user.kudu.apache.org>
Reply-To: user@kudu.apache.org
Delivered-To: mailing list user@kudu.apache.org
Received: (qmail 67827 invoked by uid 99); 27 Jun 2019 05:58:44 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2019 05:58:44 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3E5B71A41AE
	for <user@kudu.apache.org>; Thu, 27 Jun 2019 05:58:44 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 2.051
X-Spam-Level: **
X-Spam-Status: No, score=2.051 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
	URIBL_BLOCKED=0.001] autolearn=disabled
Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=cloudera.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id d2nCpM3l5crJ for <user@kudu.apache.org>;
	Thu, 27 Jun 2019 05:58:42 +0000 (UTC)
Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8291E5F533
	for <user@kudu.apache.org>; Thu, 27 Jun 2019 05:58:41 +0000 (UTC)
Received: by mail-lf1-f44.google.com with SMTP id 136so688620lfa.8
        for <user@kudu.apache.org>; Wed, 26 Jun 2019 22:58:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cloudera.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=fPIZZaq2WShwe8hD4YdclWm73hzKgSO8tdE5gbVh1dQ=;
        b=VZS1sLw7YV5UsGsUAp4sgd6BiMqmlkH37X0hR0DJsAbXd/lwkEppf0Deh06G2DTXD/
         d4BhPStpwwaRYLKLLDvW2r5SMgCuZmYnig6gN3ifL4CxkC1FHT5gykrDvSLcNV014ug7
         e6+q8Uc9dzXcExJc+q0xyepWsT1eFVzDQUHNF6glB06sCRdJvofIxUABQ5fIWd49lL3v
         0KZyii98rWEVdzuOjJ0whCGjmML9OE0KxK+31F91EWniQ/tRfYvBhy0VIknQxd+cG9eU
         V3pjhQMZdielIqSLWzalZ2E9rlTNO/xXa4cnMWgv22r+9k/h5whWepbY+blmNtf0TZ/g
         L/6w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=fPIZZaq2WShwe8hD4YdclWm73hzKgSO8tdE5gbVh1dQ=;
        b=qceyDD+knX2HF1XkwDLEpRMxTv6f6uRWuNHS29gBk8I2c7hhEzrGGwprlagb0yMQta
         gn7WrxI67ros2t45dlWZa9fpdyZEVAacOYSSvI6uAN0QGtY5WE55hquTmJzhJu7h5paD
         m5zy7ebJ35XCEGNQD0WjtbCMIXIu1AuJOx3qyrpiUnutHMDaO1aKtyjZl49Cxf5NLDYw
         LDU9CLhZj15JSPCttN/6ZMq1cVQuEoFhtTGLYVAThORaJS2iNP1zVScCIIqgZ+Oedocf
         WC1/qMPic7hJFmxMA+Bq7IaxfZnrcCYPhmatINnRBf1/FE4Miy8Nfm7zx3n4AMjuTLwr
         PCqw==
X-Gm-Message-State: APjAAAVldq7752tDv/yqSLIvo1dLYKL+riYNrTyZswoJ78+ABlhApw3l
	gsbK40rV1LChuBIxOf13F0mpnARWv71+F9dNdT8UTQpx
X-Google-Smtp-Source: APXvYqxA+fib4KNvIyj4G9UiSgOOqFYpYI6MO2a6j57xk6PA8LIgAAkbezrii6iIW3LBeI2i8iLki3tp3UFHX5OIYyE=
X-Received: by 2002:a19:7110:: with SMTP id m16mr1052805lfc.4.1561615113061;
 Wed, 26 Jun 2019 22:58:33 -0700 (PDT)
MIME-Version: 1.0
References: <CAOHFFGTf=khAc0BAGwW=GuhbjeYhAffw7fDozHd=sQ7q-2Wneg@mail.gmail.com>
 <CADY20s64tDsy9odeSCMcDBOsG8XgqbND72Xp2Z+vhfTZnz-6zw@mail.gmail.com>
 <CAOHFFGQ=v51xqrdBeN5ktGaj579PtD4XaWJ9HyKSqgWZzAyCEA@mail.gmail.com>
 <CADY20s6z=Z6ZTR5vGujLK=G9jQ=ftOBUDpJkhdR2+8GeHmMSKA@mail.gmail.com> <CAOHFFGRRwkNCD4PC_6qTVTKCx09hZiK00W2+-K9k7cZV-Rvx+g@mail.gmail.com>
In-Reply-To: <CAOHFFGRRwkNCD4PC_6qTVTKCx09hZiK00W2+-K9k7cZV-Rvx+g@mail.gmail.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Wed, 26 Jun 2019 22:58:21 -0700
Message-ID: <CADY20s76dU35pxBLgQ4UepUCd3MtdKz6cNd+oc8R4MbsY1WE1g@mail.gmail.com>
Subject: Re: WAL size estimation
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary="000000000000173377058c47daea"

--000000000000173377058c47daea
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hey Pavel,

I went back and looked at the source here. It appears that 24MB is the
expected size for an index file -- each entry is 24 bytes and the index
file should keep 1M entries.

That said, for a "cold tablet" (in which you'd have only a small number of
actual WAL files) I would expect only a single index file. The example you
gave where you have 12 index files but only one WAL segment seems quite
fishy to me. Having 12 index files indicates you have 12M separate WAL
entries, but given you have only 8MB of WAL, that indicates each entry is
less than one byte large, which doesn't make much sense at all.

If you go back and look at that same tablet now, did it eventually GC those
log index files?

-Todd


On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov <mr.xkurt@gmail.com> wrote:

> > Try adding the '-p' flag here? That should show preallocated extents.
> Would be interesting to run it on some index file which is larger than 1M=
B,
> for example.
>
> # du -h --apparent-size index.000000108
> 23M     index.000000108
>
> # du -h index.000000108
> 23M     index.000000108
>
> # xfs_bmap -v -p index.000000108
> index.000000108:
>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOTAL
> FLAGS
>    0: [0..2719]:       1175815920..1175818639  2 (3704560..3707279)  2720
> 00000
>    1: [2720..5111]:    1175828904..1175831295  2 (3717544..3719935)  2392
> 00000
>    2: [5112..7767]:    1175835592..1175838247  2 (3724232..3726887)  2656
> 00000
>    3: [7768..10567]:   1175849896..1175852695  2 (3738536..3741335)  2800
> 00000
>    4: [10568..15751]:  1175877808..1175882991  2 (3766448..3771631)  5184
> 00000
>    5: [15752..18207]:  1175898864..1175901319  2 (3787504..3789959)  2456
> 00000
>    6: [18208..20759]:  1175909192..1175911743  2 (3797832..3800383)  2552
> 00000
>    7: [20760..23591]:  1175921616..1175924447  2 (3810256..3813087)  2832
> 00000
>    8: [23592..26207]:  1175974872..1175977487  2 (3863512..3866127)  2616
> 00000
>    9: [26208..28799]:  1175989496..1175992087  2 (3878136..3880727)  2592
> 00000
>   10: [28800..31199]:  1175998552..1176000951  2 (3887192..3889591)  2400
> 00000
>   11: [31200..33895]:  1176008336..1176011031  2 (3896976..3899671)  2696
> 00000
>   12: [33896..36591]:  1176031696..1176034391  2 (3920336..3923031)  2696
> 00000
>   13: [36592..39191]:  1176037440..1176040039  2 (3926080..3928679)  2600
> 00000
>   14: [39192..41839]:  1176072008..1176074655  2 (3960648..3963295)  2648
> 00000
>   15: [41840..44423]:  1176097752..1176100335  2 (3986392..3988975)  2584
> 00000
>   16: [44424..46879]:  1176132144..1176134599  2 (4020784..4023239)  2456
> 00000
>
>
>
>
>
> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd Lipc=
on <todd@cloudera.com>:
>
>>
>>
>> On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov <mr.xkurt@gmail.com>
>> wrote:
>>
>>> Hi Todd, thanks for the answer!
>>>
>>> > Any chance you've done something like copy the files away and back
>>> that might cause them to lose their sparseness?
>>>
>>> No, I don't think so. Recently we experienced some problems with
>>> stability with Kudu, and ran rebalance a couple of times, if this relat=
ed.
>>> But we never used fs commands like cp/mv against Kudu dirs.
>>>
>>> I ran du on all-WALs dir:
>>> # du -sh /mnt/data01/kudu-tserver-wal/
>>> 12G     /mnt/data01/kudu-tserver-wal/
>>>
>>> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
>>> 25G     /mnt/data01/kudu-tserver-wal/
>>>
>>> And on WAL with a many indexes:
>>> # du -sh --apparent-size
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>> 306M
>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>
>>> # du -sh
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>> 296M
>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>
>>>
>>> > Also, any chance you're using XFS here?
>>>
>>> Yes, exactly XFS. We use CentOS 7.6.
>>>
>>> What is interesting, there are no many holes in index files in
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WAL=
 dir
>>> that I mention before). Only single hole in single index file (of 13 fi=
les):
>>> # xfs_bmap -v index.000000120
>>>
>>
>> Try adding the '-p' flag here? That should show preallocated extents.
>> Would be interesting to run it on some index file which is larger than 1=
MB,
>> for example.
>>
>>
>>> index.000000120:
>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOT=
AL
>>>    0: [0..4231]:       1176541248..1176545479  2 (4429888..4434119)  42=
32
>>>    1: [4232..9815]:    1176546592..1176552175  2 (4435232..4440815)  55=
84
>>>    2: [9816..11583]:   1176552832..1176554599  2 (4441472..4443239)  17=
68
>>>    3: [11584..13319]:  1176558672..1176560407  2 (4447312..4449047)  17=
36
>>>    4: [13320..15239]:  1176565336..1176567255  2 (4453976..4455895)  19=
20
>>>    5: [15240..17183]:  1176570776..1176572719  2 (4459416..4461359)  19=
44
>>>    6: [17184..18999]:  1176575856..1176577671  2 (4464496..4466311)  18=
16
>>>    7: [19000..20927]:  1176593552..1176595479  2 (4482192..4484119)  19=
28
>>>    8: [20928..22703]:  1176599128..1176600903  2 (4487768..4489543)  17=
76
>>>    9: [22704..24575]:  1176602704..1176604575  2 (4491344..4493215)  18=
72
>>>   10: [24576..26495]:  1176611936..1176613855  2 (4500576..4502495)  19=
20
>>>   11: [26496..26655]:  1176615040..1176615199  2 (4503680..4503839)   1=
60
>>>   12: [26656..46879]:  hole                                         202=
24
>>>
>>> But in some other WAL I see like this:
>>> # xfs_bmap -v
>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/index=
.000000000
>>>
>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/inde=
x.000000000:
>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET        TOTAL
>>>    0: [0..7]:          1758753776..1758753783  3 (586736..586743)     8
>>>    1: [8..46879]:      hole                                       46872
>>>
>>> Looks like there actually used only 8 blocks and all other blocks are
>>> the hole.
>>>
>>>
>>> So looks like I can use formulas with confidence.
>>> Normal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,00=
0
>>> MB =3D ~1.3 TB (+ some minor index overhead)
>>> Worse case: 8 MB/segment * 1 segment * 2000 tablets =3D 1,280,000 MB =
=3D ~16
>>> GB (+ some minor index overhead)
>>>
>>> Right?
>>>
>>>
>>> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd Li=
pcon <todd@cloudera.com>:
>>>
>>>> Hi Pavel,
>>>>
>>>> That's not quite expected. For example, on one of our test clusters
>>>> here, we have about 65GB of WALs and about 1GB of index files. If I re=
call
>>>> correctly, the index files store 8 bytes per WAL entry, so typically a
>>>> couple orders of magnitude smaller than the WALs themselves.
>>>>
>>>> One thing is that the index files are sparse. Any chance you've done
>>>> something like copy the files away and back that might cause them to l=
ose
>>>> their sparseness? If I use du --apparent-size on mine, it's total of a=
bout
>>>> 180GB vs the 1GB of actual size.
>>>>
>>>> Also, any chance you're using XFS here? XFS sometimes likes to
>>>> preallocate large amounts of data into files while they're open, and o=
nly
>>>> frees it up if disk space is contended. I think you can use 'xfs_bmap'=
 on
>>>> an index file to see the allocation status, which might be interesting=
.
>>>>
>>>> -Todd
>>>>
>>>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov <mr.xkurt@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys!
>>>>>
>>>>> We want to buy SSDs for TServers WALs for our cluster. I'm working on
>>>>> capacity estimation for this SSDs using "Getting Started with Kudu" b=
ook,
>>>>> Chapter 4, Write-Ahead Log (
>>>>> https://www.oreilly.com/library/view/getting-started-with/97814919802=
48/ch04.html
>>>>> <https://www.oreilly.com/library/view/getting-started-with/9781491980=
248/ch04.html#idm139738927926240>
>>>>> ).
>>>>>
>>>>> NB: we use default Kudu WAL configuration settings.
>>>>>
>>>>> There is a formula for worse-case:
>>>>> 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 MB =3D ~1=
.3 TB
>>>>>
>>>>> So, this formula takes into account only segment files. But in our
>>>>> cluster, I see that every segment file has >=3D 1 corresponding index=
 files.
>>>>> And every index file actually larger than segment file.
>>>>>
>>>>> Numbers from one of our nodes.
>>>>> WALs count:
>>>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l
>>>>> 711
>>>>>
>>>>> Overall WAL size:
>>>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/
>>>>> 13G     /mnt/data01/kudu-tserver-wal/
>>>>>
>>>>> Size of all segment files:
>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du
>>>>> -ch {} + | grep total$
>>>>> 6.1G    total
>>>>>
>>>>> Size of all index files:
>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du
>>>>> -ch {} + | grep total$
>>>>> 6.5G    total
>>>>>
>>>>> So I have questions.
>>>>>
>>>>> 1. How can I estimate the size of index files?
>>>>> Looks like in our cluster size of index files approximately equal to
>>>>> size segment files.
>>>>>
>>>>> 2. There is some WALs with more than one index files. For example:
>>>>> $ ls -lh
>>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
>>>>> total 296M
>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:31 index.000000108
>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:41 index.000000109
>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:52 index.000000110
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:10 index.000000111
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:22 index.000000112
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:35 index.000000113
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:48 index.000000114
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:01 index.000000115
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:14 index.000000116
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:27 index.000000117
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:40 index.000000118
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:52 index.000000119
>>>>> -rw-r--r-- 1 root root  23M Jun 19 01:13 index.000000120
>>>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799
>>>>>
>>>>> Is this a normal situation?
>>>>>
>>>>> 3. Not a question. Please, consider adding documentation about the
>>>>> estimation of WAL storage. Also, I can't found any mentions about ind=
ex
>>>>> files, except here
>>>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> with best regards, Pavel Martynov
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>> --
>>> with best regards, Pavel Martynov
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
> --
> with best regards, Pavel Martynov
>


--=20
Todd Lipcon
Software Engineer, Cloudera

--000000000000173377058c47daea
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hey Pavel,<div><br></div><div>I went back and looked at th=
e source here. It appears that 24MB is the expected size for an index file =
-- each entry is 24 bytes and the index file should keep 1M entries.</div><=
div><br></div><div>That said, for a &quot;cold tablet&quot; (in which you&#=
39;d have only a small number of actual WAL files) I would expect only a si=
ngle index file. The example you gave where you have 12 index files but onl=
y one WAL segment seems quite fishy to me. Having 12 index files indicates =
you have 12M separate WAL entries, but given you have only 8MB of WAL, that=
 indicates each entry is less than one byte large, which doesn&#39;t make m=
uch sense at all.</div><div><br></div><div>If you go back and look at that =
same tablet now, did it eventually GC those log index files?</div><div><br>=
</div><div>-Todd</div><div><br></div><div><br></div></div><br><div class=3D=
"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Jun 19, 2019 at=
 1:53 AM Pavel Martynov &lt;<a href=3D"mailto:mr.xkurt@gmail.com">mr.xkurt@=
gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex"><div dir=3D"ltr"><div>&gt; Try adding the &#39;-p&#39; flag here=
? That should show preallocated extents. Would be interesting to run it on =
some index file which is larger than 1MB, for example.</div><div><br></div>=
# du -h --apparent-size index.000000108<br>23M =C2=A0 =C2=A0 index.00000010=
8<br><br><div># du -h index.000000108<br>23M =C2=A0 =C2=A0 index.000000108<=
br><br></div><div># xfs_bmap -v -p index.000000108<br>index.000000108:<br>=
=C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL F=
LAGS<br>=C2=A0 =C2=A00: [0..2719]: =C2=A0 =C2=A0 =C2=A0 1175815920..1175818=
639 =C2=A02 (3704560..3707279) =C2=A02720 00000<br>=C2=A0 =C2=A01: [2720..5=
111]: =C2=A0 =C2=A01175828904..1175831295 =C2=A02 (3717544..3719935) =C2=A0=
2392 00000<br>=C2=A0 =C2=A02: [5112..7767]: =C2=A0 =C2=A01175835592..117583=
8247 =C2=A02 (3724232..3726887) =C2=A02656 00000<br>=C2=A0 =C2=A03: [7768..=
10567]: =C2=A0 1175849896..1175852695 =C2=A02 (3738536..3741335) =C2=A02800=
 00000<br>=C2=A0 =C2=A04: [10568..15751]: =C2=A01175877808..1175882991 =C2=
=A02 (3766448..3771631) =C2=A05184 00000<br>=C2=A0 =C2=A05: [15752..18207]:=
 =C2=A01175898864..1175901319 =C2=A02 (3787504..3789959) =C2=A02456 00000<b=
r>=C2=A0 =C2=A06: [18208..20759]: =C2=A01175909192..1175911743 =C2=A02 (379=
7832..3800383) =C2=A02552 00000<br>=C2=A0 =C2=A07: [20760..23591]: =C2=A011=
75921616..1175924447 =C2=A02 (3810256..3813087) =C2=A02832 00000<br>=C2=A0 =
=C2=A08: [23592..26207]: =C2=A01175974872..1175977487 =C2=A02 (3863512..386=
6127) =C2=A02616 00000<br>=C2=A0 =C2=A09: [26208..28799]: =C2=A01175989496.=
.1175992087 =C2=A02 (3878136..3880727) =C2=A02592 00000<br>=C2=A0 10: [2880=
0..31199]: =C2=A01175998552..1176000951 =C2=A02 (3887192..3889591) =C2=A024=
00 00000<br>=C2=A0 11: [31200..33895]: =C2=A01176008336..1176011031 =C2=A02=
 (3896976..3899671) =C2=A02696 00000<br>=C2=A0 12: [33896..36591]: =C2=A011=
76031696..1176034391 =C2=A02 (3920336..3923031) =C2=A02696 00000<br>=C2=A0 =
13: [36592..39191]: =C2=A01176037440..1176040039 =C2=A02 (3926080..3928679)=
 =C2=A02600 00000<br>=C2=A0 14: [39192..41839]: =C2=A01176072008..117607465=
5 =C2=A02 (3960648..3963295) =C2=A02648 00000<br>=C2=A0 15: [41840..44423]:=
 =C2=A01176097752..1176100335 =C2=A02 (3986392..3988975) =C2=A02584 00000<b=
r>=C2=A0 16: [44424..46879]: =C2=A01176132144..1176134599 =C2=A02 (4020784.=
.4023239) =C2=A02456 00000</div><div><br></div><div><br></div><div><br><br>=
</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_=
attr">=D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd =
Lipcon &lt;<a href=3D"mailto:todd@cloudera.com" target=3D"_blank">todd@clou=
dera.com</a>&gt;:<br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
"><div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote=
"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Jun 19, 2019 at 12:49 AM Pa=
vel Martynov &lt;<a href=3D"mailto:mr.xkurt@gmail.com" target=3D"_blank">mr=
.xkurt@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa=
dding-left:1ex"><div dir=3D"ltr">Hi Todd, thanks for the answer!<div><br></=
div><div>&gt; Any chance you&#39;ve done something like copy the files away=
 and back that might cause them to lose their sparseness?=C2=A0=C2=A0<br></=
div><div><br></div><div>No, I don&#39;t think so. Recently we experienced s=
ome problems with stability with Kudu, and ran rebalance a couple of times,=
 if this related. But we never used fs commands like cp/mv against Kudu dir=
s.<br></div><div><br></div><div>I ran du on all-WALs dir:</div><div># du -s=
h /mnt/data01/kudu-tserver-wal/<br>12G =C2=A0 =C2=A0 /mnt/data01/kudu-tserv=
er-wal/<br><br># du -sh --apparent-size /mnt/data01/kudu-tserver-wal/<br>25=
G =C2=A0 =C2=A0 /mnt/data01/kudu-tserver-wal/<br><br></div><div>And on WAL =
with a many indexes:</div><div># du -sh --apparent-size /mnt/data01/kudu-ts=
erver-wal/wals/779a382ea4e6464aa80ea398070a391f<br>306M =C2=A0 =C2=A0/mnt/d=
ata01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f<br><br></div><=
div># du -sh /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a=
391f<br>296M =C2=A0 =C2=A0/mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464=
aa80ea398070a391f<br></div><div><br></div><div><br></div><div>&gt; Also, an=
y chance you&#39;re using XFS here?<br></div><div><br></div><div>Yes, exact=
ly XFS. We use CentOS 7.6.</div><div><br></div><div>What is interesting, th=
ere are no many holes in index files in /mnt/data01/kudu-tserver-wal/wals/7=
79a382ea4e6464aa80ea398070a391f (WAL dir that I mention before). Only singl=
e hole in single index file (of 13 files):</div><div># xfs_bmap -v index.00=
0000120<br></div></div></blockquote><div><br></div><div>Try adding the &#39=
;-p&#39; flag here? That should show preallocated extents. Would be interes=
ting to run it on some index file which is larger than 1MB, for example.</d=
iv><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0=
px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div =
dir=3D"ltr"><div>index.000000120:<br>=C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =
=C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL<br>=C2=A0 =C2=A00: [0..4231]: =C2=A0 =
=C2=A0 =C2=A0 1176541248..1176545479 =C2=A02 (4429888..4434119) =C2=A04232<=
br>=C2=A0 =C2=A01: [4232..9815]: =C2=A0 =C2=A01176546592..1176552175 =C2=A0=
2 (4435232..4440815) =C2=A05584<br>=C2=A0 =C2=A02: [9816..11583]: =C2=A0 11=
76552832..1176554599 =C2=A02 (4441472..4443239) =C2=A01768<br>=C2=A0 =C2=A0=
3: [11584..13319]: =C2=A01176558672..1176560407 =C2=A02 (4447312..4449047) =
=C2=A01736<br>=C2=A0 =C2=A04: [13320..15239]: =C2=A01176565336..1176567255 =
=C2=A02 (4453976..4455895) =C2=A01920<br>=C2=A0 =C2=A05: [15240..17183]: =
=C2=A01176570776..1176572719 =C2=A02 (4459416..4461359) =C2=A01944<br>=C2=
=A0 =C2=A06: [17184..18999]: =C2=A01176575856..1176577671 =C2=A02 (4464496.=
.4466311) =C2=A01816<br>=C2=A0 =C2=A07: [19000..20927]: =C2=A01176593552..1=
176595479 =C2=A02 (4482192..4484119) =C2=A01928<br>=C2=A0 =C2=A08: [20928..=
22703]: =C2=A01176599128..1176600903 =C2=A02 (4487768..4489543) =C2=A01776<=
br>=C2=A0 =C2=A09: [22704..24575]: =C2=A01176602704..1176604575 =C2=A02 (44=
91344..4493215) =C2=A01872<br>=C2=A0 10: [24576..26495]: =C2=A01176611936..=
1176613855 =C2=A02 (4500576..4502495) =C2=A01920<br>=C2=A0 11: [26496..2665=
5]: =C2=A01176615040..1176615199 =C2=A02 (4503680..4503839) =C2=A0 160<br>=
=C2=A0 12: [26656..46879]: =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 20224</div><div><br></div><div>But in some =
other WAL I see like this:</div><div># xfs_bmap -v /mnt/data01/kudu-tserver=
-wal/wals/508ecdfa8904bdb97a02078a91822af/index.000000000<br>/mnt/data01/ku=
du-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:<br>=
=C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL<br>=C2=
=A0 =C2=A00: [0..7]: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01758753776..17587537=
83 =C2=A03 (586736..586743) =C2=A0 =C2=A0 8<br>=C2=A0 =C2=A01: [8..46879]: =
=C2=A0 =C2=A0 =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 46872<br></div><div><br></div><div>Looks like there actually use=
d only 8 blocks and all other blocks are the hole.</div><div><br></div><div=
><br></div><div>So looks like I can use formulas with confidence.</div><div=
>Normal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 M=
B =3D ~1.3 TB (+ some minor index overhead)</div><div>Worse case: 8 MB/segm=
ent * 1 segment * 2000 tablets =3D 1,280,000 MB =3D ~16 GB (+ some minor in=
dex overhead)<br></div><div><br></div><div>Right?</div><div><br></div></div=
><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=D1=
=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd Lipcon &l=
t;<a href=3D"mailto:todd@cloudera.com" target=3D"_blank">todd@cloudera.com<=
/a>&gt;:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div di=
r=3D"ltr">Hi Pavel,<div><br></div><div>That&#39;s not quite expected. For e=
xample, on one of our test clusters here, we have about 65GB of WALs and ab=
out 1GB of index files. If I recall correctly, the index files store 8 byte=
s per WAL entry, so typically a couple orders of magnitude smaller than the=
 WALs themselves.</div><div><br></div><div>One thing is that the index file=
s are sparse. Any chance you&#39;ve done something like copy the files away=
 and back that might cause them to lose their sparseness? If I use du --app=
arent-size on mine, it&#39;s total of about 180GB vs the 1GB of actual size=
.</div><div><br></div><div>Also, any chance you&#39;re using XFS here? XFS =
sometimes likes to preallocate large amounts of data into files while they&=
#39;re open, and only frees it up if disk space is contended. I think you c=
an use &#39;xfs_bmap&#39; on an index file to see the allocation status, wh=
ich might be interesting.</div><div><br></div><div>-Todd</div></div><br><di=
v class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jun 1=
8, 2019 at 11:12 PM Pavel Martynov &lt;<a href=3D"mailto:mr.xkurt@gmail.com=
" target=3D"_blank">mr.xkurt@gmail.com</a>&gt; wrote:<br></div><blockquote =
class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px sol=
id rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hi guys!<div><br></d=
iv><div>We want to buy SSDs for TServers WALs for our cluster. I&#39;m work=
ing on capacity estimation for this SSDs using &quot;Getting Started with K=
udu&quot; book, Chapter 4, Write-Ahead Log (<a href=3D"https://www.oreilly.=
com/library/view/getting-started-with/9781491980248/ch04.html#idm1397389279=
26240" target=3D"_blank">https://www.oreilly.com/library/view/getting-start=
ed-with/9781491980248/ch04.html</a>).</div><div><br></div><div>NB: we use d=
efault Kudu WAL configuration settings.</div><div><br></div><div>There is a=
 formula for worse-case:</div><div>8 MB/segment * 80 max segments * 2000 ta=
blets =3D 1,280,000 MB =3D ~1.3 TB<br></div><div><br></div><div>So, this fo=
rmula takes into account only segment files. But in our cluster, I see that=
 every segment file has &gt;=3D 1 corresponding index files. And every inde=
x file actually larger than segment file.</div><div><br></div><div>Numbers =
from one of our nodes.</div><div>WALs count:</div><div>$ ls /mnt/data01/kud=
u-tserver-wal/wals/ | wc -l<br>711<br></div><div><br></div><div>Overall WAL=
 size:<br>$ du -d 0 -h /mnt/data01/kudu-tserver-wal/<br>13G =C2=A0 =C2=A0 /=
mnt/data01/kudu-tserver-wal/</div><div><br></div><div>Size of all segment f=
iles:</div><div>$ find /mnt/data01/kudu-tserver-wal/ -type f -name &#39;wal=
-*&#39; -exec du -ch {} + | grep total$<br>6.1G =C2=A0 =C2=A0total<br></div=
><div><br></div><div>Size of all index files:</div><div>$ find /mnt/data01/=
kudu-tserver-wal/ -type f -name &#39;index*&#39; -exec du -ch {} + | grep t=
otal$<br>6.5G =C2=A0 =C2=A0total</div><div><br></div><div>So I have questio=
ns.</div><div><br></div><div>1. How can I estimate the size of index files?=
</div><div>Looks like in our cluster size of index files approximately equa=
l to size segment files.</div><div><br></div><div>2. There is some WALs wit=
h more than one index files. For example:</div><div>$ ls -lh /mnt/data01/ku=
du-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/<br>total 296M<br>-rw-=
r--r-- 1 root root =C2=A023M Jun 18 21:31 index.000000108<br>-rw-r--r-- 1 r=
oot root =C2=A023M Jun 18 21:41 index.000000109<br>-rw-r--r-- 1 root root =
=C2=A023M Jun 18 21:52 index.000000110<br>-rw-r--r-- 1 root root =C2=A023M =
Jun 18 22:10 index.000000111<br>-rw-r--r-- 1 root root =C2=A023M Jun 18 22:=
22 index.000000112<br>-rw-r--r-- 1 root root =C2=A023M Jun 18 22:35 index.0=
00000113<br>-rw-r--r-- 1 root root =C2=A023M Jun 18 22:48 index.000000114<b=
r>-rw-r--r-- 1 root root =C2=A023M Jun 18 23:01 index.000000115<br>-rw-r--r=
-- 1 root root =C2=A023M Jun 18 23:14 index.000000116<br>-rw-r--r-- 1 root =
root =C2=A023M Jun 18 23:27 index.000000117<br>-rw-r--r-- 1 root root =C2=
=A023M Jun 18 23:40 index.000000118<br>-rw-r--r-- 1 root root =C2=A023M Jun=
 18 23:52 index.000000119<br>-rw-r--r-- 1 root root =C2=A023M Jun 19 01:13 =
index.000000120<br>-rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799<b=
r></div><div><br></div><div>Is this a normal situation?</div><div><br></div=
><div>3. Not a question. Please, consider adding documentation about the es=
timation of WAL storage. Also, I can&#39;t found any mentions about index f=
iles, except here=C2=A0<a href=3D"https://kudu.apache.org/docs/scaling_guid=
e.html#file_descriptors" target=3D"_blank">https://kudu.apache.org/docs/sca=
ling_guide.html#file_descriptors</a>.</div><div><br></div><div>Thanks!</div=
><div><div><br></div>-- <br><div class=3D"gmail-m_3631967974831540611gmail-=
m_-1981890599346295557gmail-m_-1302181724280569539gmail-m_-7277175221344590=
141gmail-m_-6051491009798031631gmail_signature" dir=3D"ltr">with best regar=
ds, Pavel Martynov<br></div></div></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_3631967974831540611gmail-m_-1981890599346295557gmail-m_-1=
302181724280569539gmail-m_-7277175221344590141gmail_signature">Todd Lipcon<=
br>Software Engineer, Cloudera</div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_3631967974831540611gmail-m_-1981890599346295557gmail-m_-1=
302181724280569539gmail_signature">with best regards, Pavel Martynov<br></d=
iv>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_3631967974831540611gmail-m_-1981890599346295557gmail_sign=
ature">Todd Lipcon<br>Software Engineer, Cloudera</div></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_3631967974831540611gmail_signature">with best regards, Pa=
vel Martynov<br></div>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail_signature">Todd Lipcon<br>Software Engineer, Cloudera</div>

--000000000000173377058c47daea--