kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: About data file size and on-disk size
Date Mon, 28 Nov 2016 20:15:34 GMT
Hi Xiang,

Adar and I did some investigation and came up with a likely cause:
https://issues.apache.org/jira/browse/KUDU-1764

Can you please try the following on one of your .data files? (preferably
one which has a modification time a few weeks old?)

$ du -sm abcdef.data
$ filefrag -v -b abcdef.data
$ ls -l abcdef.data

We can use this to confirm whether you are hitting the same bug we just
discovered.

Thanks
-Todd

On Thu, Nov 24, 2016 at 6:57 AM, 阿香 <1654407779@qq.com> wrote:

>
> > If the workload doesn't involve normal (merging) compactions, then UNDOs
> won't be GCed at all. So, if you have a relatively static set of keys, and
> are just updating them without causing many new inserts, this could be the
> problem.
>
> The keys are not relatively static and increasing all the time.
> The key of the table is a uuid string with hash partition (16 buckets).
> Currently there are about 1000,000,000 rows in this cluster.
>
> Will these big data files increase the latency time of the upsert
> operation?
>
> I saw the metrics like following by kudu web UI.
>
>             {
>                 "name": "write_op_duration_client_propagated_consistency",
>                 "total_count": 8568729,
>                 "min": 116,
>                 "mean": 2499.56,
>                 "percentile_75": 2176,
>                 "percentile_95": 7680,
>                 "percentile_99": 29568,
>                 "percentile_99_9": 78336,
>                 "percentile_99_99": 123904,
>                 "max": 1562967,
>                 "total_sum": 21418050385
>             }
>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Todd Lipcon";<todd@cloudera.com>;
> *发送时间:* 2016年11月24日(星期四) 中午11:55
> *收件人:* "user"<user@kudu.apache.org>;
> *主题:* Re: About data file size and on-disk size
>
> On Wed, Nov 23, 2016 at 2:30 PM, Adar Dembo <adar@cloudera.com> wrote:
>
>> The difference between du with --apparent-size and without suggests
>> that hole punching is working properly. Quick back of the envelope
>> math shows that with 8133 containers, each container is just over 10G
>> of "apparent size", which means nearly all of the containers were full
>> at one point or another. That makes sense; it means that Kudu is
>> generally writing to a small number of containers at any given time,
>> but is filling them up over time.
>>
>> I took a look at the tablet disk estimation code and found that it
>> excludes the size of all of the UNDO data blocks. I think this is
>> because the size estimation is also used to drive decisions regarding
>> delta compaction, but with an UPSERT-only workload like yours, we'd
>> expect to see many UNDO data blocks over time as updated (and now
>> historical) data is further and further compacted. I filed
>> https://issues.apache.org/jira/browse/KUDU-1755 to track these issues.
>> However, if this were the case, I'd expect the "tablet history GC"
>> feature (new in Kudu 1.0) to remove old data that was mutated in an
>> UPSERT. The default value for --tablet_history_max_age_sec (which
>> controls how old the data must be before it is removed) is 15 minutes;
>> have you changed the value of this flag? If not, could you look at
>> your tserver log for the presence of major delta compactions? Look for
>> references to MajorDeltaCompactionOp. If there aren't any, that means
>> Kudu isn't getting opportunities to age out old data.
>>
>
> Worth noting that major delta compaction doesn't actually remove old
> UNDOs. There are still some open JIRAs about scheduling tasks to age-off
> UNDOs, but as it stands today, they only get collected during a normal
> compaction.
>
> If the workload doesn't involve normal (merging) compactions, then UNDOs
> won't be GCed at all. So, if you have a relatively static set of keys, and
> are just updating them without causing many new inserts, this could be the
> problem.
>
>
>>
>> It's also possible that simply not accounting for the composite index
>> and bloom blocks (see KUDU-1755) is the reason. Take a look at
>> https://issues.apache.org/jira/browse/KUDU-624?focusedCommen
>> tId=15165054&page=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-15165054
>> and run the same two commands to compare the total on-disk size of all
>> the .data files to the number of bytes that the tserver is aware of.
>> If the two numbers are close, it's a sign that, at the very least,
>> Kudu is aware of and actively managing all that disk space (i.e.
>> there's no "orphaned" data).
>>
>
> -Todd
>
>
>>
>>
>>
>> On Wed, Nov 23, 2016 at 12:39 AM, 阿香 <1654407779@qq.com> wrote:
>> > Hi,
>> >
>> >> Can you tell us a little bit more about your table, as well as any
>> deleted
>> >> tables you once had? How many columns did they have?
>> >
>> > I do not delete any tables before.
>> > There is only one table with 12 columns(string and int) in the kudu
>> cluster.
>> > This cluster has three tablet servers.
>> >
>> > I use upsert operation to insert&update rows.
>> >
>> >> what version of Kudu are you using?
>> >
>> > kudu -version
>> > kudu 1.0.0
>> > revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191
>> > build type RELEASE
>> > built by jenkins at 16 Sep 2016 00:23:10 PST on
>> > impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com
>> > build id 2016-09-16_00-03-04
>> >
>> >> It's conceivable that there's a pathological case wherein each of the
>> 8133
>> >> data files is used, one at a time, to store data blocks, which would
>> cause
>> >> each to allocate 32 MB of disk space (totaling about 254G).
>> >
>> > Can the number of data files be decreased? The SSD disk is almost out of
>> > space now.
>> >
>> >> Can you try running du with --apparent-size and compare the results?
>> >
>> > # du -sh /data/kudu/tserver/data/
>> > 213G /data/kudu/tserver/data/
>> > # du -sh --apparent-size  /data/kudu/tserver/data/
>> > 81T /data/kudu/tserver/data/
>> >
>> >> What filesystem is being used for /data/kudu/tserver/data?
>> >
>> > # file -s /dev/vdb1
>> > /dev/vdb1: Linux rev 1.0 ext4 filesystem data,
>> > UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs journal recovery)
>> (extents)
>> > (64bit) (large files) (huge files)
>> >
>> >
>> > Thanks.
>> >
>> >
>> > ------------------ 原始邮件 ------------------
>> > 发件人: "Adar Dembo";<adar@cloudera.com>;
>> > 发送时间: 2016年11月23日(星期三) 上午9:35
>> > 收件人: "user"<user@kudu.apache.org>;
>> > 主题: Re: About data file size and on-disk size
>> >
>> > Also, if you haven't explicitly disabled it, each .data file is going
>> > to preallocate 32 MB of data when used. It's conceivable that there's
>> > a pathological case wherein each of the 8133 data files is used, one
>> > at a time, to store data blocks, which would cause each to allocate 32
>> > MB of disk space (totaling about 254G).
>> >
>> > Can you tell us a little bit more about your table, as well as any
>> > deleted tables you once had? How many columns did they have? Also,
>> > what version of Kudu are you using?
>> >
>> > On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <adar@cloudera.com> wrote:
>> >> The files in /data/kudu/tserver/data are supposed to be sparse; that
>> >> is, when Kudu decides to delete data, it'll punch a hole in one of
>> >> those files, allowing the filesystem to reclaim the space in that
>> >> hole. Yet, 'du' should reflect that because it measures real space
>> >> usage. Can you try running du with --apparent-size and compare the
>> >> results? If they're the same or similar, it suggests that the hole
>> >> punching behavior isn't working properly. What distribution are you
>> >> using? What filesystem is being used for /data/kudu/tserver/data?
>> >>
>> >> You should also check if maybe Kudu has failed to delete the data
>> >> belonging to deleted tables. Has this tserver hosted any tablets
>> >> belonging to tables that have since been deleted? Does the tserver log
>> >> describe any errors when trying to delete the data belonging to those
>> >> tablets?
>> >>
>> >> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <1654407779@qq.com> wrote:
>> >>> Hi,
>> >>>
>> >>>
>> >>> I have a table with 16 buckets over 3 physical machines. The tablet
>> only
>> >>> has
>> >>> one replica.
>> >>>
>> >>>
>> >>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
>> >>>
>> >>> In one machine, there are total  8 tablets, so the on-disk size is
>> about
>> >>> 4.5*8 = 36G.
>> >>>
>> >>> however, in the same machine, the disk actually used is about 211G.
>> >>>
>> >>>
>> >>> # du -sh /data/kudu/tserver/data/
>> >>>
>> >>> 210G /data/kudu/tserver/data/
>> >>>
>> >>>
>> >>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
>> >>>
>> >>> 8133
>> >>>
>> >>>
>> >>>
>> >>> What’s the difference between data file and on-disk size.
>> >>>
>> >>> Can files in  /data/kudu/tserver/data/ be compacted, purged, or some
>> of
>> >>> them
>> >>> be deleted?
>> >>>
>> >>>
>> >>> Thanks very much.
>> >>>
>> >>>
>> >>> BR
>> >>>
>> >>> Brooks
>> >>>
>> >>>
>> >>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message