kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "阿香" <1654407...@qq.com>
Subject 回复: About data file size and on-disk size
Date Thu, 24 Nov 2016 14:39:54 GMT
> https://issues.apache.org/jira/browse/KUDU-624


bytes_under_management is much smaller than the size of data files. 
So there are lots of "orphaned" data?  Can this space be GCed?


# du -sk /data/kudu/tserver/data/*.data | perl -n -e 'if (/^(\d+)/) { $x += $1; } END { print
$x * 1024 . "\n"; }'


235763863552


#curl -s http://localhost:8050/jsonmetricz  | grep -A1 bytes_under


"name": "log_block_manager_bytes_under_management",
"value": 41110087979







> The default value for --tablet_history_max_age_sec (which controls how old the data must
be before it is removed) is 15 minutes; have you changed the value of this flag?


I do not change the value of tablet_history_max_age_sec (The value is 900 seconds)


> Look for references to MajorDeltaCompactionOp. If there aren't any, that means Kudu isn't
getting opportunities to age out old data.


There are MajorDeltaCompactionOp(s) in logs file:


I1122 15:11:44.777091  4169 maintenance_manager.cc:367] MajorDeltaCompactionOp(5bb6d219c739433db4a61fe2b41a2ed0)
metrics: {"cfile_cache_miss":8,"cfile_cache_miss_bytes":1271,"cfile_init":4,"delta_iterators_relevant":2,"fdatasync":7,"fdatasync_us":21036,"lbm
root 0.queue_time_us":20,"lbm_read_time_us":4095,"lbm_reads_1-10_ms":1,"lbm_reads_lt_1ms":23,"lbm_write_time_us":66,"lbm_writes_lt_1ms":16,"tcmalloc_contention_cycles":66816}


I1122 15:12:12.564499  4169 maintenance_manager.cc:361] Time spent running MajorDeltaCompactionOp(5bb6d219c739433db4a61fe2b41a2ed0):
real 0.051s	user 0.018s	sys 0.001s
I1122 15:12:12.564540  4169 maintenance_manager.cc:367] MajorDeltaCompactionOp(5bb6d219c739433db4a61fe2b41a2ed0)
metrics: {"cfile_cache_miss":14,"cfile_cache_miss_bytes":4413,"cfile_init":5,"delta_iterators_relevant":4,"fdatasync":7,"fdatasync_us":10123,"lbm
root 0.queue_time_us":24,"lbm_read_time_us":8889,"lbm_reads_1-10_ms":4,"lbm_reads_lt_1ms":30,"lbm_write_time_us":78,"lbm_writes_lt_1ms":16,"tcmalloc_contention_cycles":94976}


I1122 15:12:12.803815  4169 maintenance_manager.cc:361] Time spent running MajorDeltaCompactionOp(5bb6d219c739433db4a61fe2b41a2ed0):
real 0.036s	user 0.014s	sys 0.001s
I1122 15:12:12.803866  4169 maintenance_manager.cc:367] MajorDeltaCompactionOp(5bb6d219c739433db4a61fe2b41a2ed0)
metrics: {"cfile_cache_miss":8,"cfile_cache_miss_bytes":393,"cfile_init":4,"delta_iterators_relevant":1,"fdatasync":7,"fdatasync_us":9947,"lbm
root 0.queue_time_us":42,"lbm_read_time_us":4346,"lbm_reads_1-10_ms":3,"lbm_reads_lt_1ms":21,"lbm_write_time_us":50,"lbm_writes_lt_1ms":16}









------------------ 原始邮件 ------------------
发件人: "Adar Dembo";<adar@cloudera.com>;
发送时间: 2016年11月24日(星期四) 上午6:30
收件人: "user"<user@kudu.apache.org>; 

主题: Re: About data file size and on-disk size



The difference between du with --apparent-size and without suggests
that hole punching is working properly. Quick back of the envelope
math shows that with 8133 containers, each container is just over 10G
of "apparent size", which means nearly all of the containers were full
at one point or another. That makes sense; it means that Kudu is
generally writing to a small number of containers at any given time,
but is filling them up over time.

I took a look at the tablet disk estimation code and found that it
excludes the size of all of the UNDO data blocks. I think this is
because the size estimation is also used to drive decisions regarding
delta compaction, but with an UPSERT-only workload like yours, we'd
expect to see many UNDO data blocks over time as updated (and now
historical) data is further and further compacted. I filed
https://issues.apache.org/jira/browse/KUDU-1755 to track these issues.
However, if this were the case, I'd expect the "tablet history GC"
feature (new in Kudu 1.0) to remove old data that was mutated in an
UPSERT. The default value for --tablet_history_max_age_sec (which
controls how old the data must be before it is removed) is 15 minutes;
have you changed the value of this flag? If not, could you look at
your tserver log for the presence of major delta compactions? Look for
references to MajorDeltaCompactionOp. If there aren't any, that means
Kudu isn't getting opportunities to age out old data.

It's also possible that simply not accounting for the composite index
and bloom blocks (see KUDU-1755) is the reason. Take a look at
https://issues.apache.org/jira/browse/KUDU-624?focusedCommentId=15165054&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15165054
and run the same two commands to compare the total on-disk size of all
the .data files to the number of bytes that the tserver is aware of.
If the two numbers are close, it's a sign that, at the very least,
Kudu is aware of and actively managing all that disk space (i.e.
there's no "orphaned" data).



On Wed, Nov 23, 2016 at 12:39 AM, 阿香 <1654407779@qq.com> wrote:
> Hi,
>
>> Can you tell us a little bit more about your table, as well as any deleted
>> tables you once had? How many columns did they have?
>
> I do not delete any tables before.
> There is only one table with 12 columns(string and int) in the kudu cluster.
> This cluster has three tablet servers.
>
> I use upsert operation to insert&update rows.
>
>> what version of Kudu are you using?
>
> kudu -version
> kudu 1.0.0
> revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191
> build type RELEASE
> built by jenkins at 16 Sep 2016 00:23:10 PST on
> impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com
> build id 2016-09-16_00-03-04
>
>> It's conceivable that there's a pathological case wherein each of the 8133
>> data files is used, one at a time, to store data blocks, which would cause
>> each to allocate 32 MB of disk space (totaling about 254G).
>
> Can the number of data files be decreased? The SSD disk is almost out of
> space now.
>
>> Can you try running du with --apparent-size and compare the results?
>
> # du -sh /data/kudu/tserver/data/
> 213G /data/kudu/tserver/data/
> # du -sh --apparent-size  /data/kudu/tserver/data/
> 81T /data/kudu/tserver/data/
>
>> What filesystem is being used for /data/kudu/tserver/data?
>
> # file -s /dev/vdb1
> /dev/vdb1: Linux rev 1.0 ext4 filesystem data,
> UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs journal recovery) (extents)
> (64bit) (large files) (huge files)
>
>
> Thanks.
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Adar Dembo";<adar@cloudera.com>;
> 发送时间: 2016年11月23日(星期三) 上午9:35
> 收件人: "user"<user@kudu.apache.org>;
> 主题: Re: About data file size and on-disk size
>
> Also, if you haven't explicitly disabled it, each .data file is going
> to preallocate 32 MB of data when used. It's conceivable that there's
> a pathological case wherein each of the 8133 data files is used, one
> at a time, to store data blocks, which would cause each to allocate 32
> MB of disk space (totaling about 254G).
>
> Can you tell us a little bit more about your table, as well as any
> deleted tables you once had? How many columns did they have? Also,
> what version of Kudu are you using?
>
> On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <adar@cloudera.com> wrote:
>> The files in /data/kudu/tserver/data are supposed to be sparse; that
>> is, when Kudu decides to delete data, it'll punch a hole in one of
>> those files, allowing the filesystem to reclaim the space in that
>> hole. Yet, 'du' should reflect that because it measures real space
>> usage. Can you try running du with --apparent-size and compare the
>> results? If they're the same or similar, it suggests that the hole
>> punching behavior isn't working properly. What distribution are you
>> using? What filesystem is being used for /data/kudu/tserver/data?
>>
>> You should also check if maybe Kudu has failed to delete the data
>> belonging to deleted tables. Has this tserver hosted any tablets
>> belonging to tables that have since been deleted? Does the tserver log
>> describe any errors when trying to delete the data belonging to those
>> tablets?
>>
>> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <1654407779@qq.com> wrote:
>>> Hi,
>>>
>>>
>>> I have a table with 16 buckets over 3 physical machines. The tablet only
>>> has
>>> one replica.
>>>
>>>
>>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
>>>
>>> In one machine, there are total  8 tablets, so the on-disk size is about
>>> 4.5*8 = 36G.
>>>
>>> however, in the same machine, the disk actually used is about 211G.
>>>
>>>
>>> # du -sh /data/kudu/tserver/data/
>>>
>>> 210G /data/kudu/tserver/data/
>>>
>>>
>>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
>>>
>>> 8133
>>>
>>>
>>>
>>> What’s the difference between data file and on-disk size.
>>>
>>> Can files in  /data/kudu/tserver/data/ be compacted, purged, or some of
>>> them
>>> be deleted?
>>>
>>>
>>> Thanks very much.
>>>
>>>
>>> BR
>>>
>>> Brooks
>>>
>>>
>>>
Mime
View raw message