kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Dembo <a...@cloudera.com>
Subject Re: About data file size and on-disk size
Date Tue, 22 Nov 2016 19:39:52 GMT
The files in /data/kudu/tserver/data are supposed to be sparse; that
is, when Kudu decides to delete data, it'll punch a hole in one of
those files, allowing the filesystem to reclaim the space in that
hole. Yet, 'du' should reflect that because it measures real space
usage. Can you try running du with --apparent-size and compare the
results? If they're the same or similar, it suggests that the hole
punching behavior isn't working properly. What distribution are you
using? What filesystem is being used for /data/kudu/tserver/data?

You should also check if maybe Kudu has failed to delete the data
belonging to deleted tables. Has this tserver hosted any tablets
belonging to tables that have since been deleted? Does the tserver log
describe any errors when trying to delete the data belonging to those
tablets?

On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <1654407779@qq.com> wrote:
> Hi,
>
>
> I have a table with 16 buckets over 3 physical machines. The tablet only has
> one replica.
>
>
> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
>
> In one machine, there are total  8 tablets, so the on-disk size is about
> 4.5*8 = 36G.
>
> however, in the same machine, the disk actually used is about 211G.
>
>
> # du -sh /data/kudu/tserver/data/
>
> 210G /data/kudu/tserver/data/
>
>
> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
>
> 8133
>
>
>
> What’s the difference between data file and on-disk size.
>
> Can files in  /data/kudu/tserver/data/ be compacted, purged, or some of them
> be deleted?
>
>
> Thanks very much.
>
>
> BR
>
> Brooks
>
>
>

Mime
View raw message