kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Lieber-Dembo <a...@cloudera.com>
Subject Re: How to decrease kudu server restart time
Date Wed, 15 Aug 2018 16:49:16 GMT
The information you provided is the FsReport from the log file of one
node, and it represents all of the data on that node. Is this the only
table in your cluster? Or do you have others? I didn't see the output
of `kudu local_replica data_size`; did you forget to include that?

It seems that your average block size is quite small (about 60K),
which is part of the reason you're seeing so many blocks. You
mentioned having a high number of updates; Kudu isn't optimized for
that. One of the things that may be happening here is that the table
is fully compacted and yet updates are still streaming in. AFAIK,
ancient history is only cleaned up during compactions, but if there
are none, ancient history will persist and your 3.3 million records
will actually be represented by many more blocks (and bytes) on disk.
I also wonder whether, by virtue of being fully compacted and with
only 50k records being ingested, Kudu is aggressively flushing your
DeltaMemStores (the in-memory stores that accumulate updates) and thus
producing tiny blocks. In a workload with more writes Kudu will be
busy flushing the tablets' MemRowSets at the expense of flushing
DeltaMemStores, so by the time they are flushed, they'll be much
beefier. But an idle Kudu should be compacting those blocks via minor
and major delta compactions, so eventually those tiny blocks will be
coalesced into larger ones.

Your partitioning schema, by virtue of being a hash of the entire
primary key, appears to be optimized for reads at the expense of
writes. That makes sense given how little you're ingesting.

I wouldn't recommend changing any of those parameters; the default
values are usually fine.

How many data directories do you have? We recommend setting
--maintenance_manager_num_threads to be equal to the number of data
directories divided by 3.

On Wed, Aug 15, 2018 at 3:21 AM Gary Gao <garygaowork@gmail.com> wrote:
>
> The output of command [kudu local_replica data_size] are shown below, but it seems that
the **Total live blocks** are the total block number of the table, not specific tablet:
>
> Total live blocks: 22515001
> Total live bytes: 1362248371390
> Total live bytes (after alignment): 1446784176128
> Total number of LBM containers: 22403 (17366 full)
> .....
> .....
>
>
> table schema:
>
> create table venus.ods_xk_pay_fee_order(
> time_day bigint,
> CREATETIME BIGINT,
> BUYERID BIGINT,
> SELLERID BIGINT,
> ORDERID String,
> BIZID BIGINT,
> ID BIGINT,
> SELLERFAMILYID BIGINT,
> PRODUCTID BIGINT,
> PRODUCTTYPE BIGINT,
> PRICE BIGINT,
> REALPRICE BIGINT,
> DISCOUNT BIGINT,
> SHARERATE BIGINT,
> DEVICETYPE BIGINT,
> DEVICEID String,
> APPID BIGINT,
> PKNAME String,
> APPVERSION String,
> CREATEIP BIGINT,
> SERIALID String,
> SCID String,
> COMPLETESTATUS BIGINT,
> COMPLETETIME BIGINT,
> TRYCOUNT BIGINT,
> APPCHANNEL String,
> SDKID BIGINT,
> LIVESTATUS BIGINT,
> PAYSTATUS BIGINT,
> THRIDORDERID String,
> LIVESOURCE BIGINT,
> LIVEPRODUCTTYPE BIGINT,
> PAYMODE BIGINT,
> SUBPRODUCTTYPE BIGINT,
> SALETYPE BIGINT,
> primary key(time_day, createtime, buyerid, sellerid, orderid, bizid, id))
> partition by hash (time_day, createtime, buyerid, sellerid, orderid, bizid, id) partitions
3,
> range(time_day)(PARTITION 1483200000 <= values < 1514736000, ...) stored as kudu
>
>
>
> There are only 3.3 millions records[in 3 tablets] in this table, and less 50 thousands
records are ingested in this table every day, with many updates.
>
>
> I deep dived into kudu flags configuration and found the following flags related to **BLOCK_SIZE**,
what is the recommended value of these flags:
>
> --cfile_default_block_size=262144
>
> --deltafile_default_block_size=32768
>
> -default_composite_key_index_block_size_bytes=4096
>
> --tablet_bloom_block_size=4096
>
>
>
> On Tue, Aug 14, 2018 at 5:41 AM Adar Lieber-Dembo <adar@cloudera.com> wrote:
>>
>> > Even if the kudu server started, it also spent too much copying tablet, as the
following tablet block copying log:
>> >
>> >
>> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table 'impala::venus.ods_xk_pay_fee_order'
is under-replicated: 1 replica(s) not RUNNING
>> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
>> >     State:       INITIALIZED
>> >     Data state:  TABLET_DATA_COPYING
>> >     Last status: Tablet Copy: Downloading block 0000000084111077 (299837/1177225)
>> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
>> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING [LEADER]
>>
>> I see that this tablet has over a million blocks, but how are you
>> measuring that it's spending too much time copying? How much time did
>> it take to fully copy this tablet?
>>
>> > 1. It seems kudu server spent a long time to open log block container, how to
speed up restarting kudu server ?
>>
>> Your Kudu server log should contain some log messages that'll help us
>> understand what's going on. Look for a message like "Time spent
>> opening block manager" and paste that.  Also can you find and paste
>> the "FS layout report"?
>>
>> In general, the more blocks (and thus block containers) you have, the
>> longer it'll take Kudu to restart. KUDU-2014 has some ideas on how we
>> might improve this.
>>
>> Once a tserver is deemed dead and its data is rereplicated elsewhere,
>> you can just reformat the node (i.e. delete the contents of the WAL,
>> metadata, and data directories). Its contents are no longer necessary,
>> and this will reset the number of log block containers to 0, which
>> will speed up subsequent restarts.
>>
>> > 2. I think the number of blocks have an influence on kudu server restarting
time and query time on specific tablet, more number of blocks, more restarting time and query
time. Is this right ?
>>
>> Yes to restarting time, but not necessarily to query time. It really
>> depends on the kinds of queries you're issuing, how many predicates
>> they have, etc.
>>
>> > 3. Why there are more than 1 million blocks in a tablet, as shown in above Tablet
Copy log, while there are less than 500 thousands of records in the tablet ?
>>
>> That's an excellent question. What kind of write workload do you have?
>> What's your table schema and partitioning? Do you have any
>> non-standard flags defined that may affect how Kudu flushes or
>> compacts its data?
>>
>> I'd also suggest running the CLI tool 'kudu local_replica data_size'
>> on that large replica you described above. It will help identify
>> whether this is a case of very large tablets, or just high numbers of
>> blocks.
>>
>> > 4. How to reduce the number of block in tablet ?
>>
>> Once you answer the questions I posed just above, I might be able to
>> offer some recommendations for how to reduce the overall number of
>> blocks.

Mime
View raw message