kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Heo <jason.heo....@gmail.com>
Subject Re: Physical Tablet Data size is larger than size in Chart Library.
Date Thu, 13 Apr 2017 04:12:20 GMT
Hi Dan.

Thank you for your kind reply.

My Kudu runs on CentOS 7.2 with xfs.

I'll try `kudu fs check`.

Thanks,

Jason

2017-04-13 5:47 GMT+09:00 Dan Burkert <danburkert@apache.org>:

> Adar has told me it's fine to run the new 'kudu fs check' tool against a
> Kudu 1.2 server.  It will require building locally, though.
>
> - Dan
>
> On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert <danburkert@apache.org>
> wrote:
>
>> Hi Jason,
>>
>> First question: what filesystem and OS are you running?
>>
>> This has been an ongoing area of work; we fixed a few major issues in
>> 1.2, and a few more major issues in 1.3, and have a new tool ('kudu fs
>> check') that will be released in 1.4 to diagnose and fix further issues.
>> In some cases we are underestimating the true size of the data, and in some
>> cases we are keeping around data that could be cleaned up.  I've included a
>> list of relevant JIRAs below if you are interested in specifics.  It should
>> be possible to get early access to the 'kudu fs check' tool by compiling
>> Kudu locally, but I'm going to defer to Adar on that, since he's the
>> resident expert on the subject.
>>
>> KUDU-1755 <https://issues.apache.org/jira/browse/KUDU-1755>
>> KUDU-1853 <https://issues.apache.org/jira/browse/KUDU-1853>
>> KUDU-1856 <https://issues.apache.org/jira/browse/KUDU-1856>
>> KUDU-1769 <https://issues.apache.org/jira/browse/KUDU-1769>
>>
>>
>>
>>
>> On Wed, Apr 12, 2017 at 5:02 AM, Jason Heo <jason.heo.sde@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I'm using Apache Kudu 1.2 on CDH 1.2.
>>>
>>> I'm estimating how many servers needed to store my data.
>>>
>>> After loading my test data sets, total_kudu_on_disk_size_
>>> across_kudu_replicas in chart library at CDH is 27.9TB whereas sum of `du
>>> -sh /path/to/tablet_data/data` on each node is 39.9TB which is 43%
>>> bigger than chart library.
>>>
>>> I also observed the same difference on my another Kudu test cluster.
>>>
>>> I'm curious this is normal and wanted to know there is a way to reduce
>>> physical file size.
>>>
>>> Thanks,
>>>
>>> Jason.
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message