kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Dembo <a...@cloudera.com>
Subject Re: About data file size and on-disk size
Date Wed, 30 Nov 2016 02:00:51 GMT
If you're comfortable rebuilding Kudu from source, you can apply
https://gerrit.cloudera.org/#/c/5254, rebuild the tserver, and restart it.
Once the tserver is done restarting, it should trim the empty space off of
the ends of all of your container data files.

Otherwise, you'll have to wait until the next Kudu release.

On Tue, Nov 29, 2016 at 5:48 PM, 阿香 <1654407779@qq.com> wrote:

>
> Hi Todd,
>
> Thanks.
> From the results, I think you successfully got the bug.
> By the way, can I get back the wasted disk space?
>
>
> # du -sm 542d51e55d524034a5274600c31abd11.data
> 29 542d51e55d524034a5274600c31abd11.data
>
> # filefrag -v -b 542d51e55d524034a5274600c31abd11.data
>
> filefrag: -b needs a blocksize option, assuming 1024-byte blocks.
> Filesystem type is: ef53
> File size of 542d51e55d524034a5274600c31abd11.data is 10767867904
> (10515496 blocks of 1024 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected:
> flags:
>    0: 10486144..10497543:  278086588.. 278097987:  11400:
> unwritten
>    1: 10497544..10514191:  278691588.. 278708235:  16648:  278097988:
> unwritten
>    2: 10514192..10514199:  279581160.. 279581167:      8:  278708236:
> unwritten
>    3: 10514200..10514203:  280291284.. 280291287:      4:  279581168:
> unwritten
>    4: 10514204..10514227:  280652252.. 280652275:     24:  280291288:
> unwritten
>    5: 10514228..10515259:  281289216.. 281290247:   1032:  280652276:
> unwritten
>    6: 10515260..10515263:  282068816.. 282068819:      4:  281290248:
> unwritten
>    7: 10515264..10515495:  283429184.. 283429415:    232:  282068820:
> unwritten,eof
> 542d51e55d524034a5274600c31abd11.data: 8 extents found
>
> # echo $[11400 + 16648 + 1032 + 232]
> 29312
>
> # ls -l 542d51e55d524034a5274600c31abd11.data
> -rw-r--r-- 1 kudu kudu 10767867904 Oct 26 06:51
> 542d51e55d524034a5274600c31abd11.data
>
> # ls -lh 542d51e55d524034a5274600c31abd11.data
> -rw-r--r-- 1 kudu kudu 11G Oct 26 06:51 542d51e55d524034a5274600c31abd
> 11.data
>
> BR
> -GU
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Todd Lipcon";<todd@cloudera.com>;
> *发送时间:* 2016年11月29日(星期二) 凌晨4:15
> *收件人:* "user"<user@kudu.apache.org>;
> *主题:* Re: About data file size and on-disk size
>
> Hi Xiang,
>
> Adar and I did some investigation and came up with a likely cause:
> https://issues.apache.org/jira/browse/KUDU-1764
>
> Can you please try the following on one of your .data files? (preferably
> one which has a modification time a few weeks old?)
>
> $ du -sm abcdef.data
> $ filefrag -v -b abcdef.data
> $ ls -l abcdef.data
>
> We can use this to confirm whether you are hitting the same bug we just
> discovered.
>
> Thanks
> -Todd
>
> On Thu, Nov 24, 2016 at 6:57 AM, 阿香 <1654407779@qq.com> wrote:
>
>>
>> > If the workload doesn't involve normal (merging) compactions, then
>> UNDOs won't be GCed at all. So, if you have a relatively static set of
>> keys, and are just updating them without causing many new inserts, this
>> could be the problem.
>>
>> The keys are not relatively static and increasing all the time.
>> The key of the table is a uuid string with hash partition (16 buckets).
>> Currently there are about 1000,000,000 rows in this cluster.
>>
>> Will these big data files increase the latency time of the upsert
>> operation?
>>
>> I saw the metrics like following by kudu web UI.
>>
>>             {
>>                 "name": "write_op_duration_client_propagated_consistency",
>>                 "total_count": 8568729,
>>                 "min": 116,
>>                 "mean": 2499.56,
>>                 "percentile_75": 2176,
>>                 "percentile_95": 7680,
>>                 "percentile_99": 29568,
>>                 "percentile_99_9": 78336,
>>                 "percentile_99_99": 123904,
>>                 "max": 1562967,
>>                 "total_sum": 21418050385
>>             }
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Todd Lipcon";<todd@cloudera.com>;
>> *发送时间:* 2016年11月24日(星期四) 中午11:55
>> *收件人:* "user"<user@kudu.apache.org>;
>> *主题:* Re: About data file size and on-disk size
>>
>> On Wed, Nov 23, 2016 at 2:30 PM, Adar Dembo <adar@cloudera.com> wrote:
>>
>>> The difference between du with --apparent-size and without suggests
>>> that hole punching is working properly. Quick back of the envelope
>>> math shows that with 8133 containers, each container is just over 10G
>>> of "apparent size", which means nearly all of the containers were full
>>> at one point or another. That makes sense; it means that Kudu is
>>> generally writing to a small number of containers at any given time,
>>> but is filling them up over time.
>>>
>>> I took a look at the tablet disk estimation code and found that it
>>> excludes the size of all of the UNDO data blocks. I think this is
>>> because the size estimation is also used to drive decisions regarding
>>> delta compaction, but with an UPSERT-only workload like yours, we'd
>>> expect to see many UNDO data blocks over time as updated (and now
>>> historical) data is further and further compacted. I filed
>>> https://issues.apache.org/jira/browse/KUDU-1755 to track these issues.
>>> However, if this were the case, I'd expect the "tablet history GC"
>>> feature (new in Kudu 1.0) to remove old data that was mutated in an
>>> UPSERT. The default value for --tablet_history_max_age_sec (which
>>> controls how old the data must be before it is removed) is 15 minutes;
>>> have you changed the value of this flag? If not, could you look at
>>> your tserver log for the presence of major delta compactions? Look for
>>> references to MajorDeltaCompactionOp. If there aren't any, that means
>>> Kudu isn't getting opportunities to age out old data.
>>>
>>
>> Worth noting that major delta compaction doesn't actually remove old
>> UNDOs. There are still some open JIRAs about scheduling tasks to age-off
>> UNDOs, but as it stands today, they only get collected during a normal
>> compaction.
>>
>> If the workload doesn't involve normal (merging) compactions, then UNDOs
>> won't be GCed at all. So, if you have a relatively static set of keys, and
>> are just updating them without causing many new inserts, this could be the
>> problem.
>>
>>
>>>
>>> It's also possible that simply not accounting for the composite index
>>> and bloom blocks (see KUDU-1755) is the reason. Take a look at
>>> https://issues.apache.org/jira/browse/KUDU-624?focusedCommen
>>> tId=15165054&page=com.atlassian.jira.plugin.system.issuetabp
>>> anels:comment-tabpanel#comment-15165054
>>> and run the same two commands to compare the total on-disk size of all
>>> the .data files to the number of bytes that the tserver is aware of.
>>> If the two numbers are close, it's a sign that, at the very least,
>>> Kudu is aware of and actively managing all that disk space (i.e.
>>> there's no "orphaned" data).
>>>
>>
>> -Todd
>>
>>
>>>
>>>
>>>
>>> On Wed, Nov 23, 2016 at 12:39 AM, 阿香 <1654407779@qq.com> wrote:
>>> > Hi,
>>> >
>>> >> Can you tell us a little bit more about your table, as well as any
>>> deleted
>>> >> tables you once had? How many columns did they have?
>>> >
>>> > I do not delete any tables before.
>>> > There is only one table with 12 columns(string and int) in the kudu
>>> cluster.
>>> > This cluster has three tablet servers.
>>> >
>>> > I use upsert operation to insert&update rows.
>>> >
>>> >> what version of Kudu are you using?
>>> >
>>> > kudu -version
>>> > kudu 1.0.0
>>> > revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191
>>> > build type RELEASE
>>> > built by jenkins at 16 Sep 2016 00:23:10 PST on
>>> > impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com
>>> > build id 2016-09-16_00-03-04
>>> >
>>> >> It's conceivable that there's a pathological case wherein each of the
>>> 8133
>>> >> data files is used, one at a time, to store data blocks, which would
>>> cause
>>> >> each to allocate 32 MB of disk space (totaling about 254G).
>>> >
>>> > Can the number of data files be decreased? The SSD disk is almost out
>>> of
>>> > space now.
>>> >
>>> >> Can you try running du with --apparent-size and compare the results?
>>> >
>>> > # du -sh /data/kudu/tserver/data/
>>> > 213G /data/kudu/tserver/data/
>>> > # du -sh --apparent-size  /data/kudu/tserver/data/
>>> > 81T /data/kudu/tserver/data/
>>> >
>>> >> What filesystem is being used for /data/kudu/tserver/data?
>>> >
>>> > # file -s /dev/vdb1
>>> > /dev/vdb1: Linux rev 1.0 ext4 filesystem data,
>>> > UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs journal recovery)
>>> (extents)
>>> > (64bit) (large files) (huge files)
>>> >
>>> >
>>> > Thanks.
>>> >
>>> >
>>> > ------------------ 原始邮件 ------------------
>>> > 发件人: "Adar Dembo";<adar@cloudera.com>;
>>> > 发送时间: 2016年11月23日(星期三) 上午9:35
>>> > 收件人: "user"<user@kudu.apache.org>;
>>> > 主题: Re: About data file size and on-disk size
>>> >
>>> > Also, if you haven't explicitly disabled it, each .data file is going
>>> > to preallocate 32 MB of data when used. It's conceivable that there's
>>> > a pathological case wherein each of the 8133 data files is used, one
>>> > at a time, to store data blocks, which would cause each to allocate 32
>>> > MB of disk space (totaling about 254G).
>>> >
>>> > Can you tell us a little bit more about your table, as well as any
>>> > deleted tables you once had? How many columns did they have? Also,
>>> > what version of Kudu are you using?
>>> >
>>> > On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <adar@cloudera.com>
>>> wrote:
>>> >> The files in /data/kudu/tserver/data are supposed to be sparse; that
>>> >> is, when Kudu decides to delete data, it'll punch a hole in one of
>>> >> those files, allowing the filesystem to reclaim the space in that
>>> >> hole. Yet, 'du' should reflect that because it measures real space
>>> >> usage. Can you try running du with --apparent-size and compare the
>>> >> results? If they're the same or similar, it suggests that the hole
>>> >> punching behavior isn't working properly. What distribution are you
>>> >> using? What filesystem is being used for /data/kudu/tserver/data?
>>> >>
>>> >> You should also check if maybe Kudu has failed to delete the data
>>> >> belonging to deleted tables. Has this tserver hosted any tablets
>>> >> belonging to tables that have since been deleted? Does the tserver log
>>> >> describe any errors when trying to delete the data belonging to those
>>> >> tablets?
>>> >>
>>> >> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <1654407779@qq.com> wrote:
>>> >>> Hi,
>>> >>>
>>> >>>
>>> >>> I have a table with 16 buckets over 3 physical machines. The tablet
>>> only
>>> >>> has
>>> >>> one replica.
>>> >>>
>>> >>>
>>> >>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
>>> >>>
>>> >>> In one machine, there are total  8 tablets, so the on-disk size
is
>>> about
>>> >>> 4.5*8 = 36G.
>>> >>>
>>> >>> however, in the same machine, the disk actually used is about 211G.
>>> >>>
>>> >>>
>>> >>> # du -sh /data/kudu/tserver/data/
>>> >>>
>>> >>> 210G /data/kudu/tserver/data/
>>> >>>
>>> >>>
>>> >>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
>>> >>>
>>> >>> 8133
>>> >>>
>>> >>>
>>> >>>
>>> >>> What’s the difference between data file and on-disk size.
>>> >>>
>>> >>> Can files in  /data/kudu/tserver/data/ be compacted, purged, or
some
>>> of
>>> >>> them
>>> >>> be deleted?
>>> >>>
>>> >>>
>>> >>> Thanks very much.
>>> >>>
>>> >>>
>>> >>> BR
>>> >>>
>>> >>> Brooks
>>> >>>
>>> >>>
>>> >>>
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message