kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: WAL size estimation
Date Thu, 27 Jun 2019 05:58:21 GMT
Hey Pavel,

I went back and looked at the source here. It appears that 24MB is the
expected size for an index file -- each entry is 24 bytes and the index
file should keep 1M entries.

That said, for a "cold tablet" (in which you'd have only a small number of
actual WAL files) I would expect only a single index file. The example you
gave where you have 12 index files but only one WAL segment seems quite
fishy to me. Having 12 index files indicates you have 12M separate WAL
entries, but given you have only 8MB of WAL, that indicates each entry is
less than one byte large, which doesn't make much sense at all.

If you go back and look at that same tablet now, did it eventually GC those
log index files?

-Todd



On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov <mr.xkurt@gmail.com> wrote:

> > Try adding the '-p' flag here? That should show preallocated extents.
> Would be interesting to run it on some index file which is larger than 1MB,
> for example.
>
> # du -h --apparent-size index.000000108
> 23M     index.000000108
>
> # du -h index.000000108
> 23M     index.000000108
>
> # xfs_bmap -v -p index.000000108
> index.000000108:
>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOTAL
> FLAGS
>    0: [0..2719]:       1175815920..1175818639  2 (3704560..3707279)  2720
> 00000
>    1: [2720..5111]:    1175828904..1175831295  2 (3717544..3719935)  2392
> 00000
>    2: [5112..7767]:    1175835592..1175838247  2 (3724232..3726887)  2656
> 00000
>    3: [7768..10567]:   1175849896..1175852695  2 (3738536..3741335)  2800
> 00000
>    4: [10568..15751]:  1175877808..1175882991  2 (3766448..3771631)  5184
> 00000
>    5: [15752..18207]:  1175898864..1175901319  2 (3787504..3789959)  2456
> 00000
>    6: [18208..20759]:  1175909192..1175911743  2 (3797832..3800383)  2552
> 00000
>    7: [20760..23591]:  1175921616..1175924447  2 (3810256..3813087)  2832
> 00000
>    8: [23592..26207]:  1175974872..1175977487  2 (3863512..3866127)  2616
> 00000
>    9: [26208..28799]:  1175989496..1175992087  2 (3878136..3880727)  2592
> 00000
>   10: [28800..31199]:  1175998552..1176000951  2 (3887192..3889591)  2400
> 00000
>   11: [31200..33895]:  1176008336..1176011031  2 (3896976..3899671)  2696
> 00000
>   12: [33896..36591]:  1176031696..1176034391  2 (3920336..3923031)  2696
> 00000
>   13: [36592..39191]:  1176037440..1176040039  2 (3926080..3928679)  2600
> 00000
>   14: [39192..41839]:  1176072008..1176074655  2 (3960648..3963295)  2648
> 00000
>   15: [41840..44423]:  1176097752..1176100335  2 (3986392..3988975)  2584
> 00000
>   16: [44424..46879]:  1176132144..1176134599  2 (4020784..4023239)  2456
> 00000
>
>
>
>
>
> ср, 19 июн. 2019 г. в 10:56, Todd Lipcon <todd@cloudera.com>:
>
>>
>>
>> On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov <mr.xkurt@gmail.com>
>> wrote:
>>
>>> Hi Todd, thanks for the answer!
>>>
>>> > Any chance you've done something like copy the files away and back
>>> that might cause them to lose their sparseness?
>>>
>>> No, I don't think so. Recently we experienced some problems with
>>> stability with Kudu, and ran rebalance a couple of times, if this related.
>>> But we never used fs commands like cp/mv against Kudu dirs.
>>>
>>> I ran du on all-WALs dir:
>>> # du -sh /mnt/data01/kudu-tserver-wal/
>>> 12G     /mnt/data01/kudu-tserver-wal/
>>>
>>> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
>>> 25G     /mnt/data01/kudu-tserver-wal/
>>>
>>> And on WAL with a many indexes:
>>> # du -sh --apparent-size
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>> 306M
>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>
>>> # du -sh
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>> 296M
>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>
>>>
>>> > Also, any chance you're using XFS here?
>>>
>>> Yes, exactly XFS. We use CentOS 7.6.
>>>
>>> What is interesting, there are no many holes in index files in
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WAL dir
>>> that I mention before). Only single hole in single index file (of 13 files):
>>> # xfs_bmap -v index.000000120
>>>
>>
>> Try adding the '-p' flag here? That should show preallocated extents.
>> Would be interesting to run it on some index file which is larger than 1MB,
>> for example.
>>
>>
>>> index.000000120:
>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOTAL
>>>    0: [0..4231]:       1176541248..1176545479  2 (4429888..4434119)  4232
>>>    1: [4232..9815]:    1176546592..1176552175  2 (4435232..4440815)  5584
>>>    2: [9816..11583]:   1176552832..1176554599  2 (4441472..4443239)  1768
>>>    3: [11584..13319]:  1176558672..1176560407  2 (4447312..4449047)  1736
>>>    4: [13320..15239]:  1176565336..1176567255  2 (4453976..4455895)  1920
>>>    5: [15240..17183]:  1176570776..1176572719  2 (4459416..4461359)  1944
>>>    6: [17184..18999]:  1176575856..1176577671  2 (4464496..4466311)  1816
>>>    7: [19000..20927]:  1176593552..1176595479  2 (4482192..4484119)  1928
>>>    8: [20928..22703]:  1176599128..1176600903  2 (4487768..4489543)  1776
>>>    9: [22704..24575]:  1176602704..1176604575  2 (4491344..4493215)  1872
>>>   10: [24576..26495]:  1176611936..1176613855  2 (4500576..4502495)  1920
>>>   11: [26496..26655]:  1176615040..1176615199  2 (4503680..4503839)   160
>>>   12: [26656..46879]:  hole                                         20224
>>>
>>> But in some other WAL I see like this:
>>> # xfs_bmap -v
>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/index.000000000
>>>
>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:
>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET        TOTAL
>>>    0: [0..7]:          1758753776..1758753783  3 (586736..586743)     8
>>>    1: [8..46879]:      hole                                       46872
>>>
>>> Looks like there actually used only 8 blocks and all other blocks are
>>> the hole.
>>>
>>>
>>> So looks like I can use formulas with confidence.
>>> Normal case: 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000
>>> MB = ~1.3 TB (+ some minor index overhead)
>>> Worse case: 8 MB/segment * 1 segment * 2000 tablets = 1,280,000 MB = ~16
>>> GB (+ some minor index overhead)
>>>
>>> Right?
>>>
>>>
>>> ср, 19 июн. 2019 г. в 09:35, Todd Lipcon <todd@cloudera.com>:
>>>
>>>> Hi Pavel,
>>>>
>>>> That's not quite expected. For example, on one of our test clusters
>>>> here, we have about 65GB of WALs and about 1GB of index files. If I recall
>>>> correctly, the index files store 8 bytes per WAL entry, so typically a
>>>> couple orders of magnitude smaller than the WALs themselves.
>>>>
>>>> One thing is that the index files are sparse. Any chance you've done
>>>> something like copy the files away and back that might cause them to lose
>>>> their sparseness? If I use du --apparent-size on mine, it's total of about
>>>> 180GB vs the 1GB of actual size.
>>>>
>>>> Also, any chance you're using XFS here? XFS sometimes likes to
>>>> preallocate large amounts of data into files while they're open, and only
>>>> frees it up if disk space is contended. I think you can use 'xfs_bmap' on
>>>> an index file to see the allocation status, which might be interesting.
>>>>
>>>> -Todd
>>>>
>>>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov <mr.xkurt@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys!
>>>>>
>>>>> We want to buy SSDs for TServers WALs for our cluster. I'm working on
>>>>> capacity estimation for this SSDs using "Getting Started with Kudu" book,
>>>>> Chapter 4, Write-Ahead Log (
>>>>> https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html
>>>>> <https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html#idm139738927926240>
>>>>> ).
>>>>>
>>>>> NB: we use default Kudu WAL configuration settings.
>>>>>
>>>>> There is a formula for worse-case:
>>>>> 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000 MB = ~1.3 TB
>>>>>
>>>>> So, this formula takes into account only segment files. But in our
>>>>> cluster, I see that every segment file has >= 1 corresponding index
files.
>>>>> And every index file actually larger than segment file.
>>>>>
>>>>> Numbers from one of our nodes.
>>>>> WALs count:
>>>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l
>>>>> 711
>>>>>
>>>>> Overall WAL size:
>>>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/
>>>>> 13G     /mnt/data01/kudu-tserver-wal/
>>>>>
>>>>> Size of all segment files:
>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du
>>>>> -ch {} + | grep total$
>>>>> 6.1G    total
>>>>>
>>>>> Size of all index files:
>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du
>>>>> -ch {} + | grep total$
>>>>> 6.5G    total
>>>>>
>>>>> So I have questions.
>>>>>
>>>>> 1. How can I estimate the size of index files?
>>>>> Looks like in our cluster size of index files approximately equal to
>>>>> size segment files.
>>>>>
>>>>> 2. There is some WALs with more than one index files. For example:
>>>>> $ ls -lh
>>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
>>>>> total 296M
>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:31 index.000000108
>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:41 index.000000109
>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:52 index.000000110
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:10 index.000000111
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:22 index.000000112
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:35 index.000000113
>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:48 index.000000114
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:01 index.000000115
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:14 index.000000116
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:27 index.000000117
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:40 index.000000118
>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:52 index.000000119
>>>>> -rw-r--r-- 1 root root  23M Jun 19 01:13 index.000000120
>>>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799
>>>>>
>>>>> Is this a normal situation?
>>>>>
>>>>> 3. Not a question. Please, consider adding documentation about the
>>>>> estimation of WAL storage. Also, I can't found any mentions about index
>>>>> files, except here
>>>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> with best regards, Pavel Martynov
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>> --
>>> with best regards, Pavel Martynov
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
> --
> with best regards, Pavel Martynov
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message