From user-return-1676-archive-asf-public=cust-asf.ponee.io@kudu.apache.org Thu Jun 27 05:58:46 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4FAA3180607 for ; Thu, 27 Jun 2019 07:58:46 +0200 (CEST) Received: (qmail 67837 invoked by uid 500); 27 Jun 2019 05:58:45 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 67827 invoked by uid 99); 27 Jun 2019 05:58:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2019 05:58:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3E5B71A41AE for ; Thu, 27 Jun 2019 05:58:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.051 X-Spam-Level: ** X-Spam-Status: No, score=2.051 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id d2nCpM3l5crJ for ; Thu, 27 Jun 2019 05:58:42 +0000 (UTC) Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8291E5F533 for ; Thu, 27 Jun 2019 05:58:41 +0000 (UTC) Received: by mail-lf1-f44.google.com with SMTP id 136so688620lfa.8 for ; Wed, 26 Jun 2019 22:58:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=fPIZZaq2WShwe8hD4YdclWm73hzKgSO8tdE5gbVh1dQ=; b=VZS1sLw7YV5UsGsUAp4sgd6BiMqmlkH37X0hR0DJsAbXd/lwkEppf0Deh06G2DTXD/ d4BhPStpwwaRYLKLLDvW2r5SMgCuZmYnig6gN3ifL4CxkC1FHT5gykrDvSLcNV014ug7 e6+q8Uc9dzXcExJc+q0xyepWsT1eFVzDQUHNF6glB06sCRdJvofIxUABQ5fIWd49lL3v 0KZyii98rWEVdzuOjJ0whCGjmML9OE0KxK+31F91EWniQ/tRfYvBhy0VIknQxd+cG9eU V3pjhQMZdielIqSLWzalZ2E9rlTNO/xXa4cnMWgv22r+9k/h5whWepbY+blmNtf0TZ/g L/6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=fPIZZaq2WShwe8hD4YdclWm73hzKgSO8tdE5gbVh1dQ=; b=qceyDD+knX2HF1XkwDLEpRMxTv6f6uRWuNHS29gBk8I2c7hhEzrGGwprlagb0yMQta gn7WrxI67ros2t45dlWZa9fpdyZEVAacOYSSvI6uAN0QGtY5WE55hquTmJzhJu7h5paD m5zy7ebJ35XCEGNQD0WjtbCMIXIu1AuJOx3qyrpiUnutHMDaO1aKtyjZl49Cxf5NLDYw LDU9CLhZj15JSPCttN/6ZMq1cVQuEoFhtTGLYVAThORaJS2iNP1zVScCIIqgZ+Oedocf WC1/qMPic7hJFmxMA+Bq7IaxfZnrcCYPhmatINnRBf1/FE4Miy8Nfm7zx3n4AMjuTLwr PCqw== X-Gm-Message-State: APjAAAVldq7752tDv/yqSLIvo1dLYKL+riYNrTyZswoJ78+ABlhApw3l gsbK40rV1LChuBIxOf13F0mpnARWv71+F9dNdT8UTQpx X-Google-Smtp-Source: APXvYqxA+fib4KNvIyj4G9UiSgOOqFYpYI6MO2a6j57xk6PA8LIgAAkbezrii6iIW3LBeI2i8iLki3tp3UFHX5OIYyE= X-Received: by 2002:a19:7110:: with SMTP id m16mr1052805lfc.4.1561615113061; Wed, 26 Jun 2019 22:58:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Todd Lipcon Date: Wed, 26 Jun 2019 22:58:21 -0700 Message-ID: Subject: Re: WAL size estimation To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="000000000000173377058c47daea" --000000000000173377058c47daea Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Pavel, I went back and looked at the source here. It appears that 24MB is the expected size for an index file -- each entry is 24 bytes and the index file should keep 1M entries. That said, for a "cold tablet" (in which you'd have only a small number of actual WAL files) I would expect only a single index file. The example you gave where you have 12 index files but only one WAL segment seems quite fishy to me. Having 12 index files indicates you have 12M separate WAL entries, but given you have only 8MB of WAL, that indicates each entry is less than one byte large, which doesn't make much sense at all. If you go back and look at that same tablet now, did it eventually GC those log index files? -Todd On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov wrote: > > Try adding the '-p' flag here? That should show preallocated extents. > Would be interesting to run it on some index file which is larger than 1M= B, > for example. > > # du -h --apparent-size index.000000108 > 23M index.000000108 > > # du -h index.000000108 > 23M index.000000108 > > # xfs_bmap -v -p index.000000108 > index.000000108: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > FLAGS > 0: [0..2719]: 1175815920..1175818639 2 (3704560..3707279) 2720 > 00000 > 1: [2720..5111]: 1175828904..1175831295 2 (3717544..3719935) 2392 > 00000 > 2: [5112..7767]: 1175835592..1175838247 2 (3724232..3726887) 2656 > 00000 > 3: [7768..10567]: 1175849896..1175852695 2 (3738536..3741335) 2800 > 00000 > 4: [10568..15751]: 1175877808..1175882991 2 (3766448..3771631) 5184 > 00000 > 5: [15752..18207]: 1175898864..1175901319 2 (3787504..3789959) 2456 > 00000 > 6: [18208..20759]: 1175909192..1175911743 2 (3797832..3800383) 2552 > 00000 > 7: [20760..23591]: 1175921616..1175924447 2 (3810256..3813087) 2832 > 00000 > 8: [23592..26207]: 1175974872..1175977487 2 (3863512..3866127) 2616 > 00000 > 9: [26208..28799]: 1175989496..1175992087 2 (3878136..3880727) 2592 > 00000 > 10: [28800..31199]: 1175998552..1176000951 2 (3887192..3889591) 2400 > 00000 > 11: [31200..33895]: 1176008336..1176011031 2 (3896976..3899671) 2696 > 00000 > 12: [33896..36591]: 1176031696..1176034391 2 (3920336..3923031) 2696 > 00000 > 13: [36592..39191]: 1176037440..1176040039 2 (3926080..3928679) 2600 > 00000 > 14: [39192..41839]: 1176072008..1176074655 2 (3960648..3963295) 2648 > 00000 > 15: [41840..44423]: 1176097752..1176100335 2 (3986392..3988975) 2584 > 00000 > 16: [44424..46879]: 1176132144..1176134599 2 (4020784..4023239) 2456 > 00000 > > > > > > =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd Lipc= on : > >> >> >> On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov >> wrote: >> >>> Hi Todd, thanks for the answer! >>> >>> > Any chance you've done something like copy the files away and back >>> that might cause them to lose their sparseness? >>> >>> No, I don't think so. Recently we experienced some problems with >>> stability with Kudu, and ran rebalance a couple of times, if this relat= ed. >>> But we never used fs commands like cp/mv against Kudu dirs. >>> >>> I ran du on all-WALs dir: >>> # du -sh /mnt/data01/kudu-tserver-wal/ >>> 12G /mnt/data01/kudu-tserver-wal/ >>> >>> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/ >>> 25G /mnt/data01/kudu-tserver-wal/ >>> >>> And on WAL with a many indexes: >>> # du -sh --apparent-size >>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>> 306M >>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>> >>> # du -sh >>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>> 296M >>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f >>> >>> >>> > Also, any chance you're using XFS here? >>> >>> Yes, exactly XFS. We use CentOS 7.6. >>> >>> What is interesting, there are no many holes in index files in >>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WAL= dir >>> that I mention before). Only single hole in single index file (of 13 fi= les): >>> # xfs_bmap -v index.000000120 >>> >> >> Try adding the '-p' flag here? That should show preallocated extents. >> Would be interesting to run it on some index file which is larger than 1= MB, >> for example. >> >> >>> index.000000120: >>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOT= AL >>> 0: [0..4231]: 1176541248..1176545479 2 (4429888..4434119) 42= 32 >>> 1: [4232..9815]: 1176546592..1176552175 2 (4435232..4440815) 55= 84 >>> 2: [9816..11583]: 1176552832..1176554599 2 (4441472..4443239) 17= 68 >>> 3: [11584..13319]: 1176558672..1176560407 2 (4447312..4449047) 17= 36 >>> 4: [13320..15239]: 1176565336..1176567255 2 (4453976..4455895) 19= 20 >>> 5: [15240..17183]: 1176570776..1176572719 2 (4459416..4461359) 19= 44 >>> 6: [17184..18999]: 1176575856..1176577671 2 (4464496..4466311) 18= 16 >>> 7: [19000..20927]: 1176593552..1176595479 2 (4482192..4484119) 19= 28 >>> 8: [20928..22703]: 1176599128..1176600903 2 (4487768..4489543) 17= 76 >>> 9: [22704..24575]: 1176602704..1176604575 2 (4491344..4493215) 18= 72 >>> 10: [24576..26495]: 1176611936..1176613855 2 (4500576..4502495) 19= 20 >>> 11: [26496..26655]: 1176615040..1176615199 2 (4503680..4503839) 1= 60 >>> 12: [26656..46879]: hole 202= 24 >>> >>> But in some other WAL I see like this: >>> # xfs_bmap -v >>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/index= .000000000 >>> >>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/inde= x.000000000: >>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL >>> 0: [0..7]: 1758753776..1758753783 3 (586736..586743) 8 >>> 1: [8..46879]: hole 46872 >>> >>> Looks like there actually used only 8 blocks and all other blocks are >>> the hole. >>> >>> >>> So looks like I can use formulas with confidence. >>> Normal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,00= 0 >>> MB =3D ~1.3 TB (+ some minor index overhead) >>> Worse case: 8 MB/segment * 1 segment * 2000 tablets =3D 1,280,000 MB = =3D ~16 >>> GB (+ some minor index overhead) >>> >>> Right? >>> >>> >>> =D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd Li= pcon : >>> >>>> Hi Pavel, >>>> >>>> That's not quite expected. For example, on one of our test clusters >>>> here, we have about 65GB of WALs and about 1GB of index files. If I re= call >>>> correctly, the index files store 8 bytes per WAL entry, so typically a >>>> couple orders of magnitude smaller than the WALs themselves. >>>> >>>> One thing is that the index files are sparse. Any chance you've done >>>> something like copy the files away and back that might cause them to l= ose >>>> their sparseness? If I use du --apparent-size on mine, it's total of a= bout >>>> 180GB vs the 1GB of actual size. >>>> >>>> Also, any chance you're using XFS here? XFS sometimes likes to >>>> preallocate large amounts of data into files while they're open, and o= nly >>>> frees it up if disk space is contended. I think you can use 'xfs_bmap'= on >>>> an index file to see the allocation status, which might be interesting= . >>>> >>>> -Todd >>>> >>>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov >>>> wrote: >>>> >>>>> Hi guys! >>>>> >>>>> We want to buy SSDs for TServers WALs for our cluster. I'm working on >>>>> capacity estimation for this SSDs using "Getting Started with Kudu" b= ook, >>>>> Chapter 4, Write-Ahead Log ( >>>>> https://www.oreilly.com/library/view/getting-started-with/97814919802= 48/ch04.html >>>>> >>>>> ). >>>>> >>>>> NB: we use default Kudu WAL configuration settings. >>>>> >>>>> There is a formula for worse-case: >>>>> 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 MB =3D ~1= .3 TB >>>>> >>>>> So, this formula takes into account only segment files. But in our >>>>> cluster, I see that every segment file has >=3D 1 corresponding index= files. >>>>> And every index file actually larger than segment file. >>>>> >>>>> Numbers from one of our nodes. >>>>> WALs count: >>>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l >>>>> 711 >>>>> >>>>> Overall WAL size: >>>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/ >>>>> 13G /mnt/data01/kudu-tserver-wal/ >>>>> >>>>> Size of all segment files: >>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du >>>>> -ch {} + | grep total$ >>>>> 6.1G total >>>>> >>>>> Size of all index files: >>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du >>>>> -ch {} + | grep total$ >>>>> 6.5G total >>>>> >>>>> So I have questions. >>>>> >>>>> 1. How can I estimate the size of index files? >>>>> Looks like in our cluster size of index files approximately equal to >>>>> size segment files. >>>>> >>>>> 2. There is some WALs with more than one index files. For example: >>>>> $ ls -lh >>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/ >>>>> total 296M >>>>> -rw-r--r-- 1 root root 23M Jun 18 21:31 index.000000108 >>>>> -rw-r--r-- 1 root root 23M Jun 18 21:41 index.000000109 >>>>> -rw-r--r-- 1 root root 23M Jun 18 21:52 index.000000110 >>>>> -rw-r--r-- 1 root root 23M Jun 18 22:10 index.000000111 >>>>> -rw-r--r-- 1 root root 23M Jun 18 22:22 index.000000112 >>>>> -rw-r--r-- 1 root root 23M Jun 18 22:35 index.000000113 >>>>> -rw-r--r-- 1 root root 23M Jun 18 22:48 index.000000114 >>>>> -rw-r--r-- 1 root root 23M Jun 18 23:01 index.000000115 >>>>> -rw-r--r-- 1 root root 23M Jun 18 23:14 index.000000116 >>>>> -rw-r--r-- 1 root root 23M Jun 18 23:27 index.000000117 >>>>> -rw-r--r-- 1 root root 23M Jun 18 23:40 index.000000118 >>>>> -rw-r--r-- 1 root root 23M Jun 18 23:52 index.000000119 >>>>> -rw-r--r-- 1 root root 23M Jun 19 01:13 index.000000120 >>>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799 >>>>> >>>>> Is this a normal situation? >>>>> >>>>> 3. Not a question. Please, consider adding documentation about the >>>>> estimation of WAL storage. Also, I can't found any mentions about ind= ex >>>>> files, except here >>>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors. >>>>> >>>>> Thanks! >>>>> >>>>> -- >>>>> with best regards, Pavel Martynov >>>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >>> >>> -- >>> with best regards, Pavel Martynov >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > -- > with best regards, Pavel Martynov > --=20 Todd Lipcon Software Engineer, Cloudera --000000000000173377058c47daea Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Pavel,

I went back and looked at th= e source here. It appears that 24MB is the expected size for an index file = -- each entry is 24 bytes and the index file should keep 1M entries.
<= div>
That said, for a "cold tablet" (in which you&#= 39;d have only a small number of actual WAL files) I would expect only a si= ngle index file. The example you gave where you have 12 index files but onl= y one WAL segment seems quite fishy to me. Having 12 index files indicates = you have 12M separate WAL entries, but given you have only 8MB of WAL, that= indicates each entry is less than one byte large, which doesn't make m= uch sense at all.

If you go back and look at that = same tablet now, did it eventually GC those log index files?

=
-Todd



On Wed, Jun 19, 2019 at= 1:53 AM Pavel Martynov <mr.xkurt@= gmail.com> wrote:
> Try adding the '-p' flag here= ? That should show preallocated extents. Would be interesting to run it on = some index file which is larger than 1MB, for example.

= # du -h --apparent-size index.000000108
23M =C2=A0 =C2=A0 index.00000010= 8

# du -h index.000000108
23M =C2=A0 =C2=A0 index.000000108<= br>
# xfs_bmap -v -p index.000000108
index.000000108:
= =C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL F= LAGS
=C2=A0 =C2=A00: [0..2719]: =C2=A0 =C2=A0 =C2=A0 1175815920..1175818= 639 =C2=A02 (3704560..3707279) =C2=A02720 00000
=C2=A0 =C2=A01: [2720..5= 111]: =C2=A0 =C2=A01175828904..1175831295 =C2=A02 (3717544..3719935) =C2=A0= 2392 00000
=C2=A0 =C2=A02: [5112..7767]: =C2=A0 =C2=A01175835592..117583= 8247 =C2=A02 (3724232..3726887) =C2=A02656 00000
=C2=A0 =C2=A03: [7768..= 10567]: =C2=A0 1175849896..1175852695 =C2=A02 (3738536..3741335) =C2=A02800= 00000
=C2=A0 =C2=A04: [10568..15751]: =C2=A01175877808..1175882991 =C2= =A02 (3766448..3771631) =C2=A05184 00000
=C2=A0 =C2=A05: [15752..18207]:= =C2=A01175898864..1175901319 =C2=A02 (3787504..3789959) =C2=A02456 00000=C2=A0 =C2=A06: [18208..20759]: =C2=A01175909192..1175911743 =C2=A02 (379= 7832..3800383) =C2=A02552 00000
=C2=A0 =C2=A07: [20760..23591]: =C2=A011= 75921616..1175924447 =C2=A02 (3810256..3813087) =C2=A02832 00000
=C2=A0 = =C2=A08: [23592..26207]: =C2=A01175974872..1175977487 =C2=A02 (3863512..386= 6127) =C2=A02616 00000
=C2=A0 =C2=A09: [26208..28799]: =C2=A01175989496.= .1175992087 =C2=A02 (3878136..3880727) =C2=A02592 00000
=C2=A0 10: [2880= 0..31199]: =C2=A01175998552..1176000951 =C2=A02 (3887192..3889591) =C2=A024= 00 00000
=C2=A0 11: [31200..33895]: =C2=A01176008336..1176011031 =C2=A02= (3896976..3899671) =C2=A02696 00000
=C2=A0 12: [33896..36591]: =C2=A011= 76031696..1176034391 =C2=A02 (3920336..3923031) =C2=A02696 00000
=C2=A0 = 13: [36592..39191]: =C2=A01176037440..1176040039 =C2=A02 (3926080..3928679)= =C2=A02600 00000
=C2=A0 14: [39192..41839]: =C2=A01176072008..117607465= 5 =C2=A02 (3960648..3963295) =C2=A02648 00000
=C2=A0 15: [41840..44423]:= =C2=A01176097752..1176100335 =C2=A02 (3986392..3988975) =C2=A02584 00000=C2=A0 16: [44424..46879]: =C2=A01176132144..1176134599 =C2=A02 (4020784.= .4023239) =C2=A02456 00000




=

=D1=81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 10:56, Todd = Lipcon <todd@clou= dera.com>:


On Wed, Jun 19, 2019 at 12:49 AM Pa= vel Martynov <mr= .xkurt@gmail.com> wrote:
Hi Todd, thanks for the answer!

> Any chance you've done something like copy the files away= and back that might cause them to lose their sparseness?=C2=A0=C2=A0

No, I don't think so. Recently we experienced s= ome problems with stability with Kudu, and ran rebalance a couple of times,= if this related. But we never used fs commands like cp/mv against Kudu dir= s.

I ran du on all-WALs dir:
# du -s= h /mnt/data01/kudu-tserver-wal/
12G =C2=A0 =C2=A0 /mnt/data01/kudu-tserv= er-wal/

# du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
25= G =C2=A0 =C2=A0 /mnt/data01/kudu-tserver-wal/

And on WAL = with a many indexes:
# du -sh --apparent-size /mnt/data01/kudu-ts= erver-wal/wals/779a382ea4e6464aa80ea398070a391f
306M =C2=A0 =C2=A0/mnt/d= ata01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f

<= div># du -sh /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a= 391f
296M =C2=A0 =C2=A0/mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464= aa80ea398070a391f


> Also, an= y chance you're using XFS here?

Yes, exact= ly XFS. We use CentOS 7.6.

What is interesting, th= ere are no many holes in index files in /mnt/data01/kudu-tserver-wal/wals/7= 79a382ea4e6464aa80ea398070a391f (WAL dir that I mention before). Only singl= e hole in single index file (of 13 files):
# xfs_bmap -v index.00= 0000120

Try adding the '= ;-p' flag here? That should show preallocated extents. Would be interes= ting to run it on some index file which is larger than 1MB, for example.
=C2=A0
index.000000120:
=C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 = =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL
=C2=A0 =C2=A00: [0..4231]: =C2=A0 = =C2=A0 =C2=A0 1176541248..1176545479 =C2=A02 (4429888..4434119) =C2=A04232<= br>=C2=A0 =C2=A01: [4232..9815]: =C2=A0 =C2=A01176546592..1176552175 =C2=A0= 2 (4435232..4440815) =C2=A05584
=C2=A0 =C2=A02: [9816..11583]: =C2=A0 11= 76552832..1176554599 =C2=A02 (4441472..4443239) =C2=A01768
=C2=A0 =C2=A0= 3: [11584..13319]: =C2=A01176558672..1176560407 =C2=A02 (4447312..4449047) = =C2=A01736
=C2=A0 =C2=A04: [13320..15239]: =C2=A01176565336..1176567255 = =C2=A02 (4453976..4455895) =C2=A01920
=C2=A0 =C2=A05: [15240..17183]: = =C2=A01176570776..1176572719 =C2=A02 (4459416..4461359) =C2=A01944
=C2= =A0 =C2=A06: [17184..18999]: =C2=A01176575856..1176577671 =C2=A02 (4464496.= .4466311) =C2=A01816
=C2=A0 =C2=A07: [19000..20927]: =C2=A01176593552..1= 176595479 =C2=A02 (4482192..4484119) =C2=A01928
=C2=A0 =C2=A08: [20928..= 22703]: =C2=A01176599128..1176600903 =C2=A02 (4487768..4489543) =C2=A01776<= br>=C2=A0 =C2=A09: [22704..24575]: =C2=A01176602704..1176604575 =C2=A02 (44= 91344..4493215) =C2=A01872
=C2=A0 10: [24576..26495]: =C2=A01176611936..= 1176613855 =C2=A02 (4500576..4502495) =C2=A01920
=C2=A0 11: [26496..2665= 5]: =C2=A01176615040..1176615199 =C2=A02 (4503680..4503839) =C2=A0 160
= =C2=A0 12: [26656..46879]: =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 20224

But in some = other WAL I see like this:
# xfs_bmap -v /mnt/data01/kudu-tserver= -wal/wals/508ecdfa8904bdb97a02078a91822af/index.000000000
/mnt/data01/ku= du-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:
= =C2=A0EXT: FILE-OFFSET =C2=A0 =C2=A0 =C2=A0BLOCK-RANGE =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0AG AG-OFFSET =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL
=C2= =A0 =C2=A00: [0..7]: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01758753776..17587537= 83 =C2=A03 (586736..586743) =C2=A0 =C2=A0 8
=C2=A0 =C2=A01: [8..46879]: = =C2=A0 =C2=A0 =C2=A0hole =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 46872

Looks like there actually use= d only 8 blocks and all other blocks are the hole.


So looks like I can use formulas with confidence.
Normal case: 8 MB/segment * 80 max segments * 2000 tablets =3D 1,280,000 M= B =3D ~1.3 TB (+ some minor index overhead)
Worse case: 8 MB/segm= ent * 1 segment * 2000 tablets =3D 1,280,000 MB =3D ~16 GB (+ some minor in= dex overhead)

Right?


=D1= =81=D1=80, 19 =D0=B8=D1=8E=D0=BD. 2019 =D0=B3. =D0=B2 09:35, Todd Lipcon &l= t;todd@cloudera.com<= /a>>:
Hi Pavel,

That's not quite expected. For e= xample, on one of our test clusters here, we have about 65GB of WALs and ab= out 1GB of index files. If I recall correctly, the index files store 8 byte= s per WAL entry, so typically a couple orders of magnitude smaller than the= WALs themselves.

One thing is that the index file= s are sparse. Any chance you've done something like copy the files away= and back that might cause them to lose their sparseness? If I use du --app= arent-size on mine, it's total of about 180GB vs the 1GB of actual size= .

Also, any chance you're using XFS here? XFS = sometimes likes to preallocate large amounts of data into files while they&= #39;re open, and only frees it up if disk space is contended. I think you c= an use 'xfs_bmap' on an index file to see the allocation status, wh= ich might be interesting.

-Todd

Hi guys!

We want to buy SSDs for TServers WALs for our cluster. I'm work= ing on capacity estimation for this SSDs using "Getting Started with K= udu" book, Chapter 4, Write-Ahead Log (https://www.oreilly.com/library/view/getting-start= ed-with/9781491980248/ch04.html).

NB: we use d= efault Kudu WAL configuration settings.

There is a= formula for worse-case:
8 MB/segment * 80 max segments * 2000 ta= blets =3D 1,280,000 MB =3D ~1.3 TB

So, this fo= rmula takes into account only segment files. But in our cluster, I see that= every segment file has >=3D 1 corresponding index files. And every inde= x file actually larger than segment file.

Numbers = from one of our nodes.
WALs count:
$ ls /mnt/data01/kud= u-tserver-wal/wals/ | wc -l
711

Overall WAL= size:
$ du -d 0 -h /mnt/data01/kudu-tserver-wal/
13G =C2=A0 =C2=A0 /= mnt/data01/kudu-tserver-wal/

Size of all segment f= iles:
$ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal= -*' -exec du -ch {} + | grep total$
6.1G =C2=A0 =C2=A0total

Size of all index files:
$ find /mnt/data01/= kudu-tserver-wal/ -type f -name 'index*' -exec du -ch {} + | grep t= otal$
6.5G =C2=A0 =C2=A0total

So I have questio= ns.

1. How can I estimate the size of index files?=
Looks like in our cluster size of index files approximately equa= l to size segment files.

2. There is some WALs wit= h more than one index files. For example:
$ ls -lh /mnt/data01/ku= du-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
total 296M
-rw-= r--r-- 1 root root =C2=A023M Jun 18 21:31 index.000000108
-rw-r--r-- 1 r= oot root =C2=A023M Jun 18 21:41 index.000000109
-rw-r--r-- 1 root root = =C2=A023M Jun 18 21:52 index.000000110
-rw-r--r-- 1 root root =C2=A023M = Jun 18 22:10 index.000000111
-rw-r--r-- 1 root root =C2=A023M Jun 18 22:= 22 index.000000112
-rw-r--r-- 1 root root =C2=A023M Jun 18 22:35 index.0= 00000113
-rw-r--r-- 1 root root =C2=A023M Jun 18 22:48 index.000000114-rw-r--r-- 1 root root =C2=A023M Jun 18 23:01 index.000000115
-rw-r--r= -- 1 root root =C2=A023M Jun 18 23:14 index.000000116
-rw-r--r-- 1 root = root =C2=A023M Jun 18 23:27 index.000000117
-rw-r--r-- 1 root root =C2= =A023M Jun 18 23:40 index.000000118
-rw-r--r-- 1 root root =C2=A023M Jun= 18 23:52 index.000000119
-rw-r--r-- 1 root root =C2=A023M Jun 19 01:13 = index.000000120
-rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799

Is this a normal situation?

3. Not a question. Please, consider adding documentation about the es= timation of WAL storage. Also, I can't found any mentions about index f= iles, except here=C2=A0https://kudu.apache.org/docs/sca= ling_guide.html#file_descriptors.

Thanks!

--
with best regar= ds, Pavel Martynov


--
Todd Lipcon<= br>Software Engineer, Cloudera


--
with best regards, Pavel Martynov


--
Todd Lipcon
Software Engineer, Cloudera


--
with best regards, Pa= vel Martynov


--
Todd Lipcon
Software Engineer, Cloudera
--000000000000173377058c47daea--