From user-return-1467-archive-asf-public=cust-asf.ponee.io@kudu.apache.org  Wed Aug 15 18:49:32 2018
Return-Path: <user-return-1467-archive-asf-public=cust-asf.ponee.io@kudu.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id ECDF818067E
	for <archive-asf-public@cust-asf.ponee.io>; Wed, 15 Aug 2018 18:49:31 +0200 (CEST)
Received: (qmail 50399 invoked by uid 500); 15 Aug 2018 16:49:31 -0000
Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@kudu.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@kudu.apache.org>
List-Post: <mailto:user@kudu.apache.org>
List-Id: <user.kudu.apache.org>
Reply-To: user@kudu.apache.org
Delivered-To: mailing list user@kudu.apache.org
Received: (qmail 50277 invoked by uid 99); 15 Aug 2018 16:49:30 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2018 16:49:30 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id EC825C1589
	for <user@kudu.apache.org>; Wed, 15 Aug 2018 16:49:29 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.121
X-Spam-Level:
X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=cloudera.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id Hj7tTtl_thbj for <user@kudu.apache.org>;
	Wed, 15 Aug 2018 16:49:28 +0000 (UTC)
Received: from mail-it0-f65.google.com (mail-it0-f65.google.com [209.85.214.65])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 3AEF85F2AC
	for <user@kudu.apache.org>; Wed, 15 Aug 2018 16:49:28 +0000 (UTC)
Received: by mail-it0-f65.google.com with SMTP id d10-v6so2550645itj.5
        for <user@kudu.apache.org>; Wed, 15 Aug 2018 09:49:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cloudera.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=zAZ4OA3Cql7gMbaEZtG/mgCRJdxzu8BmI+Q3g70RCv8=;
        b=aNldXV8w8EGCb8aAsIXj/xosjMrtSlNjZd1z9c5t4X9yJZdAjW3ligpiVpL7iU3lmT
         uBnTaSuWRBDmc5qIeYULWKEQEKxWGDYeSFsEDS3pZhJnuUsghpy+oQSjKX4ZHc70vvNN
         JUWYUJloCDoM6Ds8LmrQ5Su2a/TVek4bg+KQFb9OBvm7CI5YaRjvQRHsI08Gqy4bo8t2
         BLsx8TaUTlyq8moR44mJQfypp2GtMNP9LpjsBZ5CDLNeWBlkmTPFaW2r3U10oRdUoshE
         K6yH8no3O57yeBaG/qjOlw9QRc0QMi+PVk3685rp8qt5BmCf2LR0TAnaFxslA7IunkGH
         oY4g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=zAZ4OA3Cql7gMbaEZtG/mgCRJdxzu8BmI+Q3g70RCv8=;
        b=JMt8wtheeR1c83lnoza29EEqxowWPGygz0XbPm92TkZh/XilusoST40tNZtKFOHq7u
         f0kW9u8dJ3AmyxvPkekqbdNsitS1QitfwZUdkJZAyi9K64AGTAnrXEYHoss+QSM7Hj78
         LukJxR90/K3jQ4IDslW06lwAfUQn+n4r0wUIn+dcsKc2blrajCo267ipnluJm/r3k4/J
         diVb6s8A8hgW2uqYcdePIeubwNcp9u98Zz/5NxlcJRwNnLc/9A7SfXYgrV+QMNtf+64C
         7NtFURFuC8T0YBZs+qofxf3iTPaBkSTnBIV7EuYVfpMgg5pQNo+LCvKqPj0MBFlV43Bt
         8jqQ==
X-Gm-Message-State: AOUpUlHcyRy40ZKbGuPuHA66FkVLhXUMQvcKWHunKl0ylswX652X2OmG
	rJNkLsgRaRwCiP/ERHYShSP4xAhS6Y6Kws2TjLrG2kmkmGQ=
X-Google-Smtp-Source: AA+uWPyDPawfn7D/qwFAh4JVE6WzxyIYHT0lTQm5G1KZq5GYCSat+Tq4c2X2tzbQBEIWVWbYl/aY6OfQCt2P/GmyqpQ=
X-Received: by 2002:a24:c2c2:: with SMTP id i185-v6mr19310360itg.76.1534351767307;
 Wed, 15 Aug 2018 09:49:27 -0700 (PDT)
MIME-Version: 1.0
References: <CAAgk9rypCn07U26fDx1eRRX5UQc9iPxH+m-11oTdaSLsmL5+-w@mail.gmail.com>
 <CAMcOB6OJr_p9KNWofqGX14yoexhbNjM78JL4wa3RmF_UW1PepA@mail.gmail.com> <CAAgk9rzbpYebm2AitBtX_oS9Cq4kVN8A1Pwyh=rJow7Uhv_b0g@mail.gmail.com>
In-Reply-To: <CAAgk9rzbpYebm2AitBtX_oS9Cq4kVN8A1Pwyh=rJow7Uhv_b0g@mail.gmail.com>
From: Adar Lieber-Dembo <adar@cloudera.com>
Date: Wed, 15 Aug 2018 09:49:16 -0700
Message-ID: <CAMcOB6Mh-8ugvW29vgPmDKedKOV_WsAU4NrfR9cchjtvFw+uhQ@mail.gmail.com>
Subject: Re: How to decrease kudu server restart time
To: user@kudu.apache.org
Content-Type: text/plain; charset="UTF-8"

The information you provided is the FsReport from the log file of one
node, and it represents all of the data on that node. Is this the only
table in your cluster? Or do you have others? I didn't see the output
of `kudu local_replica data_size`; did you forget to include that?

It seems that your average block size is quite small (about 60K),
which is part of the reason you're seeing so many blocks. You
mentioned having a high number of updates; Kudu isn't optimized for
that. One of the things that may be happening here is that the table
is fully compacted and yet updates are still streaming in. AFAIK,
ancient history is only cleaned up during compactions, but if there
are none, ancient history will persist and your 3.3 million records
will actually be represented by many more blocks (and bytes) on disk.
I also wonder whether, by virtue of being fully compacted and with
only 50k records being ingested, Kudu is aggressively flushing your
DeltaMemStores (the in-memory stores that accumulate updates) and thus
producing tiny blocks. In a workload with more writes Kudu will be
busy flushing the tablets' MemRowSets at the expense of flushing
DeltaMemStores, so by the time they are flushed, they'll be much
beefier. But an idle Kudu should be compacting those blocks via minor
and major delta compactions, so eventually those tiny blocks will be
coalesced into larger ones.

Your partitioning schema, by virtue of being a hash of the entire
primary key, appears to be optimized for reads at the expense of
writes. That makes sense given how little you're ingesting.

I wouldn't recommend changing any of those parameters; the default
values are usually fine.

How many data directories do you have? We recommend setting
--maintenance_manager_num_threads to be equal to the number of data
directories divided by 3.

On Wed, Aug 15, 2018 at 3:21 AM Gary Gao <garygaowork@gmail.com> wrote:
>
> The output of command [kudu local_replica data_size] are shown below, but it seems that the **Total live blocks** are the total block number of the table, not specific tablet:
>
> Total live blocks: 22515001
> Total live bytes: 1362248371390
> Total live bytes (after alignment): 1446784176128
> Total number of LBM containers: 22403 (17366 full)
> .....
> .....
>
>
> table schema:
>
> create table venus.ods_xk_pay_fee_order(
> time_day bigint,
> CREATETIME BIGINT,
> BUYERID BIGINT,
> SELLERID BIGINT,
> ORDERID String,
> BIZID BIGINT,
> ID BIGINT,
> SELLERFAMILYID BIGINT,
> PRODUCTID BIGINT,
> PRODUCTTYPE BIGINT,
> PRICE BIGINT,
> REALPRICE BIGINT,
> DISCOUNT BIGINT,
> SHARERATE BIGINT,
> DEVICETYPE BIGINT,
> DEVICEID String,
> APPID BIGINT,
> PKNAME String,
> APPVERSION String,
> CREATEIP BIGINT,
> SERIALID String,
> SCID String,
> COMPLETESTATUS BIGINT,
> COMPLETETIME BIGINT,
> TRYCOUNT BIGINT,
> APPCHANNEL String,
> SDKID BIGINT,
> LIVESTATUS BIGINT,
> PAYSTATUS BIGINT,
> THRIDORDERID String,
> LIVESOURCE BIGINT,
> LIVEPRODUCTTYPE BIGINT,
> PAYMODE BIGINT,
> SUBPRODUCTTYPE BIGINT,
> SALETYPE BIGINT,
> primary key(time_day, createtime, buyerid, sellerid, orderid, bizid, id))
> partition by hash (time_day, createtime, buyerid, sellerid, orderid, bizid, id) partitions 3,
> range(time_day)(PARTITION 1483200000 <= values < 1514736000, ...) stored as kudu
>
>
>
> There are only 3.3 millions records[in 3 tablets] in this table, and less 50 thousands records are ingested in this table every day, with many updates.
>
>
> I deep dived into kudu flags configuration and found the following flags related to **BLOCK_SIZE**, what is the recommended value of these flags:
>
> --cfile_default_block_size=262144
>
> --deltafile_default_block_size=32768
>
> -default_composite_key_index_block_size_bytes=4096
>
> --tablet_bloom_block_size=4096
>
>
>
> On Tue, Aug 14, 2018 at 5:41 AM Adar Lieber-Dembo <adar@cloudera.com> wrote:
>>
>> > Even if the kudu server started, it also spent too much copying tablet, as the following tablet block copying log:
>> >
>> >
>> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not RUNNING
>> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
>> >     State:       INITIALIZED
>> >     Data state:  TABLET_DATA_COPYING
>> >     Last status: Tablet Copy: Downloading block 0000000084111077 (299837/1177225)
>> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
>> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING [LEADER]
>>
>> I see that this tablet has over a million blocks, but how are you
>> measuring that it's spending too much time copying? How much time did
>> it take to fully copy this tablet?
>>
>> > 1. It seems kudu server spent a long time to open log block container, how to speed up restarting kudu server ?
>>
>> Your Kudu server log should contain some log messages that'll help us
>> understand what's going on. Look for a message like "Time spent
>> opening block manager" and paste that.  Also can you find and paste
>> the "FS layout report"?
>>
>> In general, the more blocks (and thus block containers) you have, the
>> longer it'll take Kudu to restart. KUDU-2014 has some ideas on how we
>> might improve this.
>>
>> Once a tserver is deemed dead and its data is rereplicated elsewhere,
>> you can just reformat the node (i.e. delete the contents of the WAL,
>> metadata, and data directories). Its contents are no longer necessary,
>> and this will reset the number of log block containers to 0, which
>> will speed up subsequent restarts.
>>
>> > 2. I think the number of blocks have an influence on kudu server restarting time and query time on specific tablet, more number of blocks, more restarting time and query time. Is this right ?
>>
>> Yes to restarting time, but not necessarily to query time. It really
>> depends on the kinds of queries you're issuing, how many predicates
>> they have, etc.
>>
>> > 3. Why there are more than 1 million blocks in a tablet, as shown in above Tablet Copy log, while there are less than 500 thousands of records in the tablet ?
>>
>> That's an excellent question. What kind of write workload do you have?
>> What's your table schema and partitioning? Do you have any
>> non-standard flags defined that may affect how Kudu flushes or
>> compacts its data?
>>
>> I'd also suggest running the CLI tool 'kudu local_replica data_size'
>> on that large replica you described above. It will help identify
>> whether this is a case of very large tablets, or just high numbers of
>> blocks.
>>
>> > 4. How to reduce the number of block in tablet ?
>>
>> Once you answer the questions I posed just above, I might be able to
>> offer some recommendations for how to reduce the overall number of
>> blocks.