kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "阿香" <1654407...@qq.com>
Subject 回复: About data file size and on-disk size
Date Wed, 23 Nov 2016 08:39:06 GMT
Hi, 


> Can you tell us a little bit more about your table, as well as any deleted tables you
once had? How many columns did they have? 


I do not delete any tables before.
There is only one table with 12 columns(string and int) in the kudu cluster.
This cluster has three tablet servers.


I use upsert operation to insert&update rows.


> what version of Kudu are you using?


kudu -version
kudu 1.0.0
revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191
build type RELEASE
built by jenkins at 16 Sep 2016 00:23:10 PST on impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com
build id 2016-09-16_00-03-04


> It's conceivable that there's a pathological case wherein each of the 8133 data files
is used, one at a time, to store data blocks, which would cause each to allocate 32 MB of
disk space (totaling about 254G).


Can the number of data files be decreased? The SSD disk is almost out of space now.


> Can you try running du with --apparent-size and compare the results? 


# du -sh /data/kudu/tserver/data/
213G	/data/kudu/tserver/data/
# du -sh --apparent-size  /data/kudu/tserver/data/
81T	/data/kudu/tserver/data/


> What filesystem is being used for /data/kudu/tserver/data?


# file -s /dev/vdb1
/dev/vdb1: Linux rev 1.0 ext4 filesystem data, UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs
journal recovery) (extents) (64bit) (large files) (huge files)





Thanks.




------------------ 原始邮件 ------------------
发件人: "Adar Dembo";<adar@cloudera.com>;
发送时间: 2016年11月23日(星期三) 上午9:35
收件人: "user"<user@kudu.apache.org>; 

主题: Re: About data file size and on-disk size



Also, if you haven't explicitly disabled it, each .data file is going
to preallocate 32 MB of data when used. It's conceivable that there's
a pathological case wherein each of the 8133 data files is used, one
at a time, to store data blocks, which would cause each to allocate 32
MB of disk space (totaling about 254G).

Can you tell us a little bit more about your table, as well as any
deleted tables you once had? How many columns did they have? Also,
what version of Kudu are you using?

On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <adar@cloudera.com> wrote:
> The files in /data/kudu/tserver/data are supposed to be sparse; that
> is, when Kudu decides to delete data, it'll punch a hole in one of
> those files, allowing the filesystem to reclaim the space in that
> hole. Yet, 'du' should reflect that because it measures real space
> usage. Can you try running du with --apparent-size and compare the
> results? If they're the same or similar, it suggests that the hole
> punching behavior isn't working properly. What distribution are you
> using? What filesystem is being used for /data/kudu/tserver/data?
>
> You should also check if maybe Kudu has failed to delete the data
> belonging to deleted tables. Has this tserver hosted any tablets
> belonging to tables that have since been deleted? Does the tserver log
> describe any errors when trying to delete the data belonging to those
> tablets?
>
> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <1654407779@qq.com> wrote:
>> Hi,
>>
>>
>> I have a table with 16 buckets over 3 physical machines. The tablet only has
>> one replica.
>>
>>
>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
>>
>> In one machine, there are total  8 tablets, so the on-disk size is about
>> 4.5*8 = 36G.
>>
>> however, in the same machine, the disk actually used is about 211G.
>>
>>
>> # du -sh /data/kudu/tserver/data/
>>
>> 210G /data/kudu/tserver/data/
>>
>>
>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
>>
>> 8133
>>
>>
>>
>> What’s the difference between data file and on-disk size.
>>
>> Can files in  /data/kudu/tserver/data/ be compacted, purged, or some of them
>> be deleted?
>>
>>
>> Thanks very much.
>>
>>
>> BR
>>
>> Brooks
>>
>>
>>
Mime
View raw message