From user-return-1677-archive-asf-public=cust-asf.ponee.io@kudu.apache.org Thu Jun 27 06:58:53 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 16D0D180607 for ; Thu, 27 Jun 2019 08:58:51 +0200 (CEST) Received: (qmail 60657 invoked by uid 500); 27 Jun 2019 06:58:51 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 60645 invoked by uid 99); 27 Jun 2019 06:58:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2019 06:58:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7665CC5D53 for ; Thu, 27 Jun 2019 06:58:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.05 X-Spam-Level: ** X-Spam-Status: No, score=2.05 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BQVSrmkhtiYb for ; Thu, 27 Jun 2019 06:58:48 +0000 (UTC) Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 70E935FB12 for ; Thu, 27 Jun 2019 06:58:47 +0000 (UTC) Received: by mail-oi1-f182.google.com with SMTP id v186so795283oie.5 for ; Wed, 26 Jun 2019 23:58:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=mU3Opi9EgnZCHZZN7Jv9mGEkxvR9nhLgHS8FgiL5tVs=; b=O43Fb5rjKHipYOLNXkCEaxHV+pXKpB5PVBmSfdBdGGNp4O4V1y5JFBRIr5OBYUdxo0 XqWGlYo8qGMzcoLFAIabPBwggE5G+FYeGNGTBh1Ut27QHsye6EQx3jg+JCVPVZseuMuB 0Gt3gVLDfktlVmUp/uLFntWHM1qAvPm2MgRbpHc+ukNMd70hgsgSyGl80WjrCEibzgtc D8pFFiD+wlSEp5JG9TGGU5X3dHYjoI3T8LUBIT7orc92GSo4giFnTZW+oHB8xa8XfVQj 0l+cJNV0E1fFnE4khpmj+/FWZpeiehxi4/z8UoAiyTuehU/o0BN3xUx3rAqAB/ctLFIj 35CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=mU3Opi9EgnZCHZZN7Jv9mGEkxvR9nhLgHS8FgiL5tVs=; b=kTAmOJgHWyqfHeauRQHWmwIB8JxWbNGI2Wkue60hr+0T+VDif+4Zm1SpQRB0U/OMfm x9SXHMy7GQkpYKmHGq9QDh1m/Jjflvsr/A+JSUOQc2nb0/EMWm/062hfkL14zGyEzY11 UCTaJ0xnQmBUWAe8GxF76AyArRlwkXGKVNC8OLUdbdAxjq8XzIGLjFL5YtMuOQpPSimv VYZtp4QRMYPYxgK2ZIyJob2WTsmyOIb+4OhLw6T3VhJyfhDDqyb7Bq53O9M6ZlNW0/cF HqZjpf+Y7xAzdcYaGFmoZu6KPMXx8W43468TUNTzB2eeDckX5SCeYtd6N/AeI0CMSF58 m9Bw== X-Gm-Message-State: APjAAAXJYxBNvr9x+y361iwhhx48/5urI/eROHo8byVtXfGUvlZ4hEPo uBxc5PK4BGrDCzW24bYzambeSUpc3Eu0rtwZzirM9KXt X-Google-Smtp-Source: APXvYqxUpm9oV1XScSVXaIs2kG2H8IWjj8vz9ZoZdaxe1R15bygd/1n/+ujsKdbPjcZ1rta9snADTk9jnAbm6hzSlvc= X-Received: by 2002:aca:b104:: with SMTP id a4mr1337682oif.14.1561618719328; Wed, 26 Jun 2019 23:58:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pavel Martynov Date: Thu, 27 Jun 2019 09:58:27 +0300 Message-ID: Subject: Re: WAL size estimation To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="0000000000000a500f058c48b1cc" --0000000000000a500f058c48b1cc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Todd, This tablet disappeared from WAL path. I think it was time partition that we already removed. =D1=87=D1=82, 27 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 08:58, Todd Lipcon= : > Hey Pavel, > > I went back and looked at the source here. It appears that 24MB is the > expected size for an index file -- each entry is 24 bytes and the index > file should keep 1M entries. > > That said, for a "cold tablet" (in which you'd have only a small number o= f > actual WAL files) I would expect only a single index file. The example yo= u > gave where you have 12 index files but only one WAL segment seems quite > fishy to me. Having 12 index files indicates you have 12M separate WAL > entries, but given you have only 8MB of WAL, that indicates each entry is > less than one byte large, which doesn't make much sense at all. > > If you go back and look at that same tablet now, did it eventually GC > those log index files? > > -Todd > > > > On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov wrote= : > >> > Try adding the '-p' flag here? That should show preallocated extents. >> Would be interesting to run it on some index file which is larger than 1= MB, >> for example. >> >> # du -h --apparent-size index.000000108 >> 23M index.000000108 >> >> # du -h index.000000108 >> 23M index.000000108 >> >> # xfs_bmap -v -p index.000000108 >> index.000000108: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTA= L >> FLAGS >> 0: [0..2719]: 1175815920..1175818639 2 (3704560..3707279) 272= 0 >> 00000 >> 1: [2720..5111]: 1175828904..1175831295 2 (3717544..3719935) 239= 2 >> 00000 >> 2: [5112..7767]: 1175835592..1175838247 2 (3724232..3726887) 265= 6 >> 00000 >> 3: [7768..10567]: 1175849896..1175852695 2 (3738536..3741335) 280= 0 >> 00000 >> 4: [10568..15751]: 1175877808..1175882991 2 (3766448..3771631) 518= 4 >> 00000 >> 5: [15752..18207]: 1175898864..1175901319 2 (3787504..3789959) 245= 6 >> 00000 >> 6: [18208..20759]: 1175909192..1175911743 2 (3797832..3800383) 255= 2 >> 00000 >> 7: [20760..23591]: 1175921616..1175924447 2 (3810256..3813087) 283= 2 >> 00000 >> 8: [23592..26207]: 1175974872..1175977487 2 (3863512..3866127) 261= 6 >> 00000 >> 9: [26208..28799]: 1175989496..1175992087 2 (3878136..3880727) 259= 2 >> 00000 >> 10: [28800..31199]: 1175998552..1176000951 2 (3887192..3889591) 240= 0 >> 00000 >> 11: [31200..33895]: 1176008336..1176011031 2 (3896976..3899671) 269= 6 >> 00000 >> 12: [33896..36591]: 1176031696..1176034391 2 (3920336..3923031) 269= 6 >> 00000 >> 13: [36592..39191]: 1176037440..1176040039 2 (3926080..3928679) 260= 0 >> 00000 >> 14: [39192..41839]: 1176072008..1176074655 2 (3960648..3963295) 264= 8 >> 00000 >> 15: [41840..44423]: 1176097752..1176100335 2 (3986392..3988975) 258= 4 >> 00000 >> 16: [44424..46879]: 1176132144..1176134599 2 (4020784..4023239) 245= 6 >> 00000 >> >> >> >> >> >> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd Lip= con : >> >>> >>> >>> On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov >>> wrote: >>> >>>> Hi Todd, thanks for the answer! >>>> >>>> > Any chance you've done something like copy the files away and back >>>> that might cause them to lose their sparseness? >>>> >>>> No, I don't think so. Recently we experienced some problems with >>>> stability with Kudu, and ran rebalance a couple of times, if this rela= ted. >>>> But we never used fs commands like cp/mv against Kudu dirs. >>>> >>>> I ran du on all-WALs dir: >>>> # du -sh /mnt/data01/kudu-tserver-wal/ >>>> 12G /mnt/data01/kudu-tserver-wal/ >>>> >>>> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/ >>>> 25G /mnt/data01/kudu-tserver-wal/ >>>> >>>> And on WAL with a many indexes: >>>> # du -sh --apparent-size >>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>>> 306M >>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>>> >>>> # du -sh >>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>>> 296M >>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>>> >>>> >>>> > Also, any chance you're using XFS here? >>>> >>>> Yes, exactly XFS. We use CentOS 7.6. >>>> >>>> What is interesting, there are no many holes in index files in >>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WA= L dir >>>> that I mention before). Only single hole in single index file (of 13 f= iles): >>>> # xfs_bmap -v index.000000120 >>>> >>> >>> Try adding the '-p' flag here? That should show preallocated extents. >>> Would be interesting to run it on some index file which is larger than = 1MB, >>> for example. >>> >>> >>>> index.000000120: >>>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET >>>> TOTAL >>>> 0: [0..4231]: 1176541248..1176545479 2 (4429888..4434119) >>>> 4232 >>>> 1: [4232..9815]: 1176546592..1176552175 2 (4435232..4440815) >>>> 5584 >>>> 2: [9816..11583]: 1176552832..1176554599 2 (4441472..4443239) >>>> 1768 >>>> 3: [11584..13319]: 1176558672..1176560407 2 (4447312..4449047) >>>> 1736 >>>> 4: [13320..15239]: 1176565336..1176567255 2 (4453976..4455895) >>>> 1920 >>>> 5: [15240..17183]: 1176570776..1176572719 2 (4459416..4461359) >>>> 1944 >>>> 6: [17184..18999]: 1176575856..1176577671 2 (4464496..4466311) >>>> 1816 >>>> 7: [19000..20927]: 1176593552..1176595479 2 (4482192..4484119) >>>> 1928 >>>> 8: [20928..22703]: 1176599128..1176600903 2 (4487768..4489543) >>>> 1776 >>>> 9: [22704..24575]: 1176602704..1176604575 2 (4491344..4493215) >>>> 1872 >>>> 10: [24576..26495]: 1176611936..1176613855 2 (4500576..4502495) >>>> 1920 >>>> 11: [26496..26655]: 1176615040..1176615199 2 (4503680..4503839) >>>> 160 >>>> 12: [26656..46879]: hole >>>> 20224 >>>> >>>> But in some other WAL I see like this: >>>> # xfs_bmap -v >>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/inde= x.000000000 >>>> >>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/ind= ex.000000000: >>>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTA= L >>>> 0: [0..7]: 1758753776..1758753783 3 (586736..586743) = 8 >>>> 1: [8..46879]: hole 4687= 2 >>>> >>>> Looks like there actually used only 8 blocks and all other blocks are >>>> the hole. >>>> >>>> >>>> So looks like I can use formulas with confidence. >>>> Normal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,0= 00 >>>> MB =3D ~1.3 TB (+ some minor index overhead) >>>> Worse case: 8 MB/segment * 1 segment * 2000 tablets =3D 1,280,000 MB = =3D >>>> ~16 GB (+ some minor index overhead) >>>> >>>> Right? >>>> >>>> >>>> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd L= ipcon : >>>> >>>>> Hi Pavel, >>>>> >>>>> That's not quite expected. For example, on one of our test clusters >>>>> here, we have about 65GB of WALs and about 1GB of index files. If I r= ecall >>>>> correctly, the index files store 8 bytes per WAL entry, so typically = a >>>>> couple orders of magnitude smaller than the WALs themselves. >>>>> >>>>> One thing is that the index files are sparse. Any chance you've done >>>>> something like copy the files away and back that might cause them to = lose >>>>> their sparseness? If I use du --apparent-size on mine, it's total of = about >>>>> 180GB vs the 1GB of actual size. >>>>> >>>>> Also, any chance you're using XFS here? XFS sometimes likes to >>>>> preallocate large amounts of data into files while they're open, and = only >>>>> frees it up if disk space is contended. I think you can use 'xfs_bmap= ' on >>>>> an index file to see the allocation status, which might be interestin= g. >>>>> >>>>> -Todd >>>>> >>>>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov >>>>> wrote: >>>>> >>>>>> Hi guys! >>>>>> >>>>>> We want to buy SSDs for TServers WALs for our cluster. I'm working o= n >>>>>> capacity estimation for this SSDs using "Getting Started with Kudu" = book, >>>>>> Chapter 4, Write-Ahead Log ( >>>>>> https://www.oreilly.com/library/view/getting-started-with/9781491980= 248/ch04.html >>>>>> >>>>>> ). >>>>>> >>>>>> NB: we use default Kudu WAL configuration settings. >>>>>> >>>>>> There is a formula for worse-case: >>>>>> 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 MB =3D ~= 1.3 TB >>>>>> >>>>>> So, this formula takes into account only segment files. But in our >>>>>> cluster, I see that every segment file has >=3D 1 corresponding inde= x files. >>>>>> And every index file actually larger than segment file. >>>>>> >>>>>> Numbers from one of our nodes. >>>>>> WALs count: >>>>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l >>>>>> 711 >>>>>> >>>>>> Overall WAL size: >>>>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/ >>>>>> 13G /mnt/data01/kudu-tserver-wal/ >>>>>> >>>>>> Size of all segment files: >>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du >>>>>> -ch {} + | grep total$ >>>>>> 6.1G total >>>>>> >>>>>> Size of all index files: >>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du >>>>>> -ch {} + | grep total$ >>>>>> 6.5G total >>>>>> >>>>>> So I have questions. >>>>>> >>>>>> 1. How can I estimate the size of index files? >>>>>> Looks like in our cluster size of index files approximately equal to >>>>>> size segment files. >>>>>> >>>>>> 2. There is some WALs with more than one index files. For example: >>>>>> $ ls -lh >>>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/ >>>>>> total 296M >>>>>> -rw-r--r-- 1 root root 23M Jun 18 21:31 index.000000108 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 21:41 index.000000109 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 21:52 index.000000110 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 22:10 index.000000111 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 22:22 index.000000112 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 22:35 index.000000113 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 22:48 index.000000114 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 23:01 index.000000115 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 23:14 index.000000116 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 23:27 index.000000117 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 23:40 index.000000118 >>>>>> -rw-r--r-- 1 root root 23M Jun 18 23:52 index.000000119 >>>>>> -rw-r--r-- 1 root root 23M Jun 19 01:13 index.000000120 >>>>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799 >>>>>> >>>>>> Is this a normal situation? >>>>>> >>>>>> 3. Not a question. Please, consider adding documentation about the >>>>>> estimation of WAL storage. Also, I can't found any mentions about in= dex >>>>>> files, except here >>>>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> -- >>>>>> with best regards, Pavel Martynov >>>>>> >>>>> >>>>> >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>>> >>>> >>>> >>>> -- >>>> with best regards, Pavel Martynov >>>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> >> -- >> with best regards, Pavel Martynov >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > --=20 with best regards, Pavel Martynov --0000000000000a500f058c48b1cc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Todd,

This tablet disappeared from W= AL path. I think it was time partition that we already removed.
=
=D1=87= =D1=82, 27 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 08:58, Todd Lipcon <<= a href=3D"mailto:todd@cloudera.com">todd@cloudera.com>:
Hey Pavel,<= div>
I went back and looked at the source here. It appears th= at 24MB is the expected size for an index file -- each entry is 24 bytes an= d the index file should keep 1M entries.

That said= , for a "cold tablet" (in which you'd have only a small numbe= r of actual WAL files) I would expect only a single index file. The example= you gave where you have 12 index files but only one WAL segment seems quit= e fishy to me. Having 12 index files indicates you have 12M separate WAL en= tries, but given you have only 8MB of WAL, that indicates each entry is les= s than one byte large, which doesn't make much sense at all.
=
If you go back and look at that same tablet now, did it even= tually GC those log index files?

-Todd
<= br>


On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov <= mr.xkurt@gmail.com<= /a>> wrote:
<= div dir=3D"ltr">
> Try adding the '-p' flag here? That shoul= d show preallocated extents. Would be interesting to run it on some index f= ile which is larger than 1MB, for example.

# du -h --ap= parent-size index.000000108
23M =C2=A0 =C2=A0 index.000000108

# du -h index.000000108
23M =C2=A0 =C2=A0 index.000000108

# xfs_bmap -v -p index.000000108
index.000000108:
=C2=A0EXT: FI= LE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL FLAGS
=C2= =A0 =C2=A00: [0..2719]: =C2=A0 =C2=A0 =C2=A0 1175815920..1175818639 =C2=A02= (3704560..3707279) =C2=A02720 00000
=C2=A0 =C2=A01: [2720..5111]: =C2= =A0 =C2=A01175828904..1175831295 =C2=A02 (3717544..3719935) =C2=A02392 0000= 0
=C2=A0 =C2=A02: [5112..7767]: =C2=A0 =C2=A01175835592..1175838247 =C2= =A02 (3724232..3726887) =C2=A02656 00000
=C2=A0 =C2=A03: [7768..10567]: = =C2=A0 1175849896..1175852695 =C2=A02 (3738536..3741335) =C2=A02800 00000=C2=A0 =C2=A04: [10568..15751]: =C2=A01175877808..1175882991 =C2=A02 (376= 6448..3771631) =C2=A05184 00000
=C2=A0 =C2=A05: [15752..18207]: =C2=A011= 75898864..1175901319 =C2=A02 (3787504..3789959) =C2=A02456 00000
=C2=A0 = =C2=A06: [18208..20759]: =C2=A01175909192..1175911743 =C2=A02 (3797832..380= 0383) =C2=A02552 00000
=C2=A0 =C2=A07: [20760..23591]: =C2=A01175921616.= .1175924447 =C2=A02 (3810256..3813087) =C2=A02832 00000
=C2=A0 =C2=A08: = [23592..26207]: =C2=A01175974872..1175977487 =C2=A02 (3863512..3866127) =C2= =A02616 00000
=C2=A0 =C2=A09: [26208..28799]: =C2=A01175989496..11759920= 87 =C2=A02 (3878136..3880727) =C2=A02592 00000
=C2=A0 10: [28800..31199]= : =C2=A01175998552..1176000951 =C2=A02 (3887192..3889591) =C2=A02400 00000<= br>=C2=A0 11: [31200..33895]: =C2=A01176008336..1176011031 =C2=A02 (3896976= ..3899671) =C2=A02696 00000
=C2=A0 12: [33896..36591]: =C2=A01176031696.= .1176034391 =C2=A02 (3920336..3923031) =C2=A02696 00000
=C2=A0 13: [3659= 2..39191]: =C2=A01176037440..1176040039 =C2=A02 (3926080..3928679) =C2=A026= 00 00000
=C2=A0 14: [39192..41839]: =C2=A01176072008..1176074655 =C2=A02= (3960648..3963295) =C2=A02648 00000
=C2=A0 15: [41840..44423]: =C2=A011= 76097752..1176100335 =C2=A02 (3986392..3988975) =C2=A02584 00000
=C2=A0 = 16: [44424..46879]: =C2=A01176132144..1176134599 =C2=A02 (4020784..4023239)= =C2=A02456 00000







Hi Todd, thanks for the answer!

= > Any chance you've done something like copy the files away and back= that might cause them to lose their sparseness?=C2=A0=C2=A0
=
No, I don't think so. Recently we experienced some probl= ems with stability with Kudu, and ran rebalance a couple of times, if this = related. But we never used fs commands like cp/mv against Kudu dirs.

I ran du on all-WALs dir:
# du -sh /mnt/da= ta01/kudu-tserver-wal/
12G =C2=A0 =C2=A0 /mnt/data01/kudu-tserver-wal/
# du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
25G =C2=A0 = =C2=A0 /mnt/data01/kudu-tserver-wal/

And on WAL with a ma= ny indexes:
# du -sh --apparent-size /mnt/data01/kudu-tserver-wal= /wals/779a382ea4e6464aa80ea398070a391f
306M =C2=A0 =C2=A0/mnt/data01/kud= u-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f

# du = -sh /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
2= 96M =C2=A0 =C2=A0/mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398= 070a391f


> Also, any chance = you're using XFS here?

Yes, exactly XFS. W= e use CentOS 7.6.

What is interesting, there are n= o many holes in index files in /mnt/data01/kudu-tserver-wal/wals/779a382ea4= e6464aa80ea398070a391f (WAL dir that I mention before). Only single hole in= single index file (of 13 files):
# xfs_bmap -v index.000000120

Try adding the '-p' = flag here? That should show preallocated extents. Would be interesting to r= un it on some index file which is larger than 1MB, for example.
= =C2=A0
index.000000120:
=C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLO= CK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0TOTAL
=C2=A0 =C2=A00: [0..4231]: =C2=A0 =C2=A0 = =C2=A0 1176541248..1176545479 =C2=A02 (4429888..4434119) =C2=A04232
=C2= =A0 =C2=A01: [4232..9815]: =C2=A0 =C2=A01176546592..1176552175 =C2=A02 (443= 5232..4440815) =C2=A05584
=C2=A0 =C2=A02: [9816..11583]: =C2=A0 11765528= 32..1176554599 =C2=A02 (4441472..4443239) =C2=A01768
=C2=A0 =C2=A03: [11= 584..13319]: =C2=A01176558672..1176560407 =C2=A02 (4447312..4449047) =C2=A0= 1736
=C2=A0 =C2=A04: [13320..15239]: =C2=A01176565336..1176567255 =C2=A0= 2 (4453976..4455895) =C2=A01920
=C2=A0 =C2=A05: [15240..17183]: =C2=A011= 76570776..1176572719 =C2=A02 (4459416..4461359) =C2=A01944
=C2=A0 =C2=A0= 6: [17184..18999]: =C2=A01176575856..1176577671 =C2=A02 (4464496..4466311) = =C2=A01816
=C2=A0 =C2=A07: [19000..20927]: =C2=A01176593552..1176595479 = =C2=A02 (4482192..4484119) =C2=A01928
=C2=A0 =C2=A08: [20928..22703]: = =C2=A01176599128..1176600903 =C2=A02 (4487768..4489543) =C2=A01776
=C2= =A0 =C2=A09: [22704..24575]: =C2=A01176602704..1176604575 =C2=A02 (4491344.= .4493215) =C2=A01872
=C2=A0 10: [24576..26495]: =C2=A01176611936..117661= 3855 =C2=A02 (4500576..4502495) =C2=A01920
=C2=A0 11: [26496..26655]: = =C2=A01176615040..1176615199 =C2=A02 (4503680..4503839) =C2=A0 160
=C2= =A0 12: [26656..46879]: =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 20224

But in some othe= r WAL I see like this:
# xfs_bmap -v /mnt/data01/kudu-tserver-wal= /wals/508ecdfa8904bdb97a02078a91822af/index.000000000
/mnt/data01/kudu-t= server-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:
=C2=A0= EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL
=C2=A0 = =C2=A00: [0..7]: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01758753776..1758753783 = =C2=A03 (586736..586743) =C2=A0 =C2=A0 8
=C2=A0 =C2=A01: [8..46879]: =C2= =A0 =C2=A0 =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 46872

Looks like there actually used on= ly 8 blocks and all other blocks are the hole.

So looks like I can use formulas with confidence.
Nor= mal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 MB = =3D ~1.3 TB (+ some minor index overhead)
Worse case: 8 MB/segmen= t * 1 segment * 2000 tablets =3D 1,280,000 MB =3D ~16 GB (+ some minor inde= x overhead)

Right?

<= br>
=D1=81= =D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd Lipcon <<= a href=3D"mailto:todd@cloudera.com" target=3D"_blank">todd@cloudera.com= >:
Hi Pavel,

That's not quite expected. For ex= ample, on one of our test clusters here, we have about 65GB of WALs and abo= ut 1GB of index files. If I recall correctly, the index files store 8 bytes= per WAL entry, so typically a couple orders of magnitude smaller than the = WALs themselves.

One thing is that the index files= are sparse. Any chance you've done something like copy the files away = and back that might cause them to lose their sparseness? If I use du --appa= rent-size on mine, it's total of about 180GB vs the 1GB of actual size.=

Also, any chance you're using XFS here? XFS s= ometimes likes to preallocate large amounts of data into files while they&#= 39;re open, and only frees it up if disk space is contended. I think you ca= n use 'xfs_bmap' on an index file to see the allocation status, whi= ch might be interesting.

-Todd

On Tue, Jun 18= , 2019 at 11:12 PM Pavel Martynov <mr.xkurt@gmail.com> wrote:
Hi guys!

We want to buy SSDs for TServers WALs for our cluster. I'm worki= ng on capacity estimation for this SSDs using "Getting Started with Ku= du" book, Chapter 4, Write-Ahead Log (https://www.oreilly.com/library/view/getting-starte= d-with/9781491980248/ch04.html).

NB: we use de= fault Kudu WAL configuration settings.

There is a = formula for worse-case:
8 MB/segment * 80 max segments * 2000 tab= lets =3D 1,280,000 MB =3D ~1.3 TB

So, this for= mula takes into account only segment files. But in our cluster, I see that = every segment file has >=3D 1 corresponding index files. And every index= file actually larger than segment file.

Numbers f= rom one of our nodes.
WALs count:
$ ls /mnt/data01/kudu= -tserver-wal/wals/ | wc -l
711

Overall WAL = size:
$ du -d 0 -h /mnt/data01/kudu-tserver-wal/
13G =C2=A0 =C2=A0 /m= nt/data01/kudu-tserver-wal/

Size of all segment fi= les:
$ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-= *' -exec du -ch {} + | grep total$
6.1G =C2=A0 =C2=A0total
=

Size of all index files:
$ find /mnt/data01/k= udu-tserver-wal/ -type f -name 'index*' -exec du -ch {} + | grep to= tal$
6.5G =C2=A0 =C2=A0total

So I have question= s.

1. How can I estimate the size of index files?<= /div>
Looks like in our cluster size of index files approximately equal= to size segment files.

2. There is some WALs with= more than one index files. For example:
$ ls -lh /mnt/data01/kud= u-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
total 296M
-rw-r= --r-- 1 root root =C2=A023M Jun 18 21:31 index.000000108
-rw-r--r-- 1 ro= ot root =C2=A023M Jun 18 21:41 index.000000109
-rw-r--r-- 1 root root = =C2=A023M Jun 18 21:52 index.000000110
-rw-r--r-- 1 root root =C2=A023M = Jun 18 22:10 index.000000111
-rw-r--r-- 1 root root =C2=A023M Jun 18 22:= 22 index.000000112
-rw-r--r-- 1 root root =C2=A023M Jun 18 22:35 index.0= 00000113
-rw-r--r-- 1 root root =C2=A023M Jun 18 22:48 index.000000114-rw-r--r-- 1 root root =C2=A023M Jun 18 23:01 index.000000115
-rw-r--r= -- 1 root root =C2=A023M Jun 18 23:14 index.000000116
-rw-r--r-- 1 root = root =C2=A023M Jun 18 23:27 index.000000117
-rw-r--r-- 1 root root =C2= =A023M Jun 18 23:40 index.000000118
-rw-r--r-- 1 root root =C2=A023M Jun= 18 23:52 index.000000119
-rw-r--r-- 1 root root =C2=A023M Jun 19 01:13 = index.000000120
-rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799

Is this a normal situation?

3. Not a question. Please, consider adding documentation about the es= timation of WAL storage. Also, I can't found any mentions about index f= iles, except here=C2=A0https://kudu.apache.org/docs/sca= ling_guide.html#file_descriptors.

Thanks!

--
with best regards, Pavel Martynov


--
Todd Lipcon
Software Engineer, Cloudera


--
with best reg= ards, Pavel Martynov


--
Todd Lipcon
Software Engineer, Clouder= a


--
with best regards, Pavel Martynov


--
Todd Lipcon
Softwa= re Engineer, Cloudera


--
with best regards, Pavel Martynov
--0000000000000a500f058c48b1cc--