From user-return-1259-archive-asf-public=cust-asf.ponee.io@kudu.apache.org Thu Feb 22 20:03:12 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0C6B718064E for ; Thu, 22 Feb 2018 20:03:11 +0100 (CET) Received: (qmail 31898 invoked by uid 500); 22 Feb 2018 19:03:11 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 31877 invoked by uid 99); 22 Feb 2018 19:03:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Feb 2018 19:03:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E9813C170E for ; Thu, 22 Feb 2018 19:03:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Grrthcnyow1b for ; Thu, 22 Feb 2018 19:03:07 +0000 (UTC) Received: from mail-io0-f177.google.com (mail-io0-f177.google.com [209.85.223.177]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5992E5F1B3 for ; Thu, 22 Feb 2018 19:03:07 +0000 (UTC) Received: by mail-io0-f177.google.com with SMTP id e4so7096723iob.8 for ; Thu, 22 Feb 2018 11:03:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=tk/h5ZAeNMZrYkgHyFLblY+7LtQ91lYNRbF7VA3etOY=; b=DzEjrWGIPC3o37dKoFeQrWNH50e/YB9NYsl+kwqaRlzJCovClYdeauNw+pamszcyZV NjwnhstJ3OcTT8LrksI2ugBOQr20cJh8LOIosKbY+W4n7a88aA3XhFsIzZgTJ0t0zJEc Txdb/NgcU+Kpm73XpvQOYY21b1LYXD58j18qRCzXgtX3GeBkNLqR9PXPGtP9zeAgTVKe fGJhhppX1EpmuMxWLIwWBuCub1MciApSG1NEOF1b45yw8xpLGbNHm3kg2DaDQ5Q4l7Jy kJCatNOV9Odb5p+oabicFinv+FLprvgioYcP9fWuRu+9BpPAmxM+83smyAum50/TJQ3b Cm2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=tk/h5ZAeNMZrYkgHyFLblY+7LtQ91lYNRbF7VA3etOY=; b=EsUUyiPhwSC7avItEd3sekjCrB9IQSNqKZpDBRYmMHHIrUnxVOgvZstnq3vbe4li9K E17ooMzNoq+WGZIf2r9K+55rLc+mgQTfc7J+glPOGLf0/7i72rhKWBkEmLXaiZVunr/p v10FRwN4p+jkEEKF96tdMbQxzHlv+UKb0DBptTOsDU4by9r6V1o9lGLgrfdaC2DsOmsI 9LRpKNp/69kTcXgK6odrqcxzVe/Nt6p6o+8FXKwR9S5FI6IWopUz3ziZlFuRpQWWnF5k fPWEMYW18zPK8f9E8rGiVPM7GEhtfxqqd4VajIXklufJ++h7VWCTPp/udBwSZx87B8/m ++jg== X-Gm-Message-State: APf1xPA2VK9iIfnJzxLikl5PFa+dEsALWjikBvWrajXOPTU5g7tw0JEQ 7DgzPGI91gH3a3ryZADGb+NQ6kRtwopMq8lZSgJAufLk X-Google-Smtp-Source: AG47ELtDXCFGnejl6grLUVPoxzkEk2c1mdauCl42OMX01rk/iDnQN0pdNLa/hJh7MmWQoz/8PtcHDAUih3R15nliRl8= X-Received: by 10.107.152.82 with SMTP id a79mr3590148ioe.166.1519326180512; Thu, 22 Feb 2018 11:03:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.227.142 with HTTP; Thu, 22 Feb 2018 11:02:30 -0800 (PST) In-Reply-To: References: From: Hao Hao Date: Thu, 22 Feb 2018 11:02:30 -0800 Message-ID: Subject: Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="001a11407ca049c2080565d1b1bf" --001a11407ca049c2080565d1b1bf Content-Type: text/plain; charset="UTF-8" Did you happen to check the health of the cluster after the upgrade by 'kudu cluster ksck'? Best, Hao On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin wrote: > Hello, > > we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1 > and noticed quite severe performance degradation. We did CTAS from Impala > parquet table which has not changed a bit since the upgrade (even the same > # of rows) to Kudu using the follow query below. > > It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours. > > Of course Impala version was also bumped with CDH 5.13. > > Any clue why it takes so much time now? > > Table has 5.5B rows.. > > create TABLE kudutest_ts.clinical_event_nots > > PRIMARY KEY (clinical_event_id) > > PARTITION BY HASH(clinical_event_id) PARTITIONS 120 > > STORED AS KUDU > > AS > > SELECT > > clinical_event_id, > > encntr_id, > > person_id, > > encntr_financial_id, > > event_id, > > event_title_text, > > CAST(view_level as string) as view_level, > > order_id, > > catalog_cd, > > series_ref_nbr, > > accession_nbr, > > contributor_system_cd, > > reference_nbr, > > parent_event_id, > > event_reltn_cd, > > event_class_cd, > > event_cd, > > event_tag, > > CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os, > > result_val, > > result_units_cd, > > result_time_units_cd, > > task_assay_cd, > > record_status_cd, > > result_status_cd, > > CAST(authentic_flag as STRING) authentic_flag, > > CAST(publish_flag as STRING) publish_flag, > > qc_review_cd, > > normalcy_cd, > > normalcy_method_cd, > > inquire_security_cd, > > resource_group_cd, > > resource_cd, > > CAST(subtable_bit_map as STRING) subtable_bit_map, > > collating_seq, > > verified_prsnl_id, > > performed_prsnl_id, > > updt_id, > > CAST(updt_task as STRING) updt_task, > > updt_cnt, > > CAST(updt_applctx as STRING) updt_applctx, > > normal_low, > > normal_high, > > critical_low, > > critical_high, > > CAST(event_tag_set_flag as STRING) event_tag_set_flag, > > CAST(note_importance_bit_map as STRING) note_importance_bit_map, > > CAST(order_action_sequence as STRING) order_action_sequence, > > entry_mode_cd, > > source_cd, > > clinical_seq, > > CAST(event_end_tz as STRING) event_end_tz, > > CAST(event_start_tz as STRING) event_start_tz, > > CAST(performed_tz as STRING) performed_tz, > > CAST(verified_tz as STRING) verified_tz, > > task_assay_version_nbr, > > modifier_long_text_id, > > ce_dynamic_label_id, > > CAST(nomen_string_flag as STRING) nomen_string_flag, > > src_event_id, > > CAST(last_utc_ts as BIGINT) last_utc_ts, > > device_free_txt, > > CAST(trait_bit_map as STRING) trait_bit_map, > > CAST(clu_subkey1_flag as STRING) clu_subkey1_flag, > > CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm, > > CAST(event_end_dt_tm as BIGINT) event_end_dt_tm, > > CAST(event_start_dt_tm as BIGINT) event_start_dt_tm, > > CAST(expiration_dt_tm as BIGINT) expiration_dt_tm, > > CAST(verified_dt_tm as BIGINT) verified_dt_tm, > > CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm, > > CAST(updt_dt_tm as BIGINT) updt_dt_tm, > > CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm, > > CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm, > > CAST(performed_dt_tm as BIGINT) performed_dt_tm, > > txn_id_text, > > CAST(ingest_dt_tm as BIGINT) ingest_dt_tm > > FROM v500.clinical_event > --001a11407ca049c2080565d1b1bf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Did you happen to check the health of the cluster after th= e upgrade by 'kudu cluster=C2=A0ksck'?

Best,
Hao

On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <boris@boristyukin.c= om> wrote:
Hello,

we just upgraded our dev cluster from Kudu 1.3 t= o=C2=A0kudu 1.5.0-cdh5.13.1 and noticed quite severe performance degrada= tion. We did CTAS from Impala parquet table which has not changed a bit sin= ce the upgrade (even the same # of rows) to Kudu using the follow query bel= ow.

It used to take 11-11.5 ho= urs on Kudu 1.3 and now taking 50-55 hours.

=
Of course Impala version was also bumped with CDH 5.13.

Any clue why it takes so much time = now?

Table has 5.5B rows..

create TABLE = kudutest_ts.clinical_event_nots

PRIMARY KEY (clinical_event_id)

PARTITION BY HASH(clinical_event_id) PARTITIONS 120<= span>

STORED AS KUDU

AS

SELECT

=C2=A0 clinical_event_id,

=C2=A0 encntr_id,

=C2=A0 person_id,

=C2=A0 encntr_financial_id,

=C2=A0 event_id,

=C2=A0 event_title_text,

=C2=A0 CAST(view_level as string) as view_level,

=C2=A0 order_id,

=C2=A0 catalog_cd,

=C2=A0 series_ref_nbr,

=C2=A0 accession_nbr,

=C2=A0 contributor_system_cd,

=C2=A0 reference_nbr,

=C2=A0 parent_event_id,

=C2=A0 event_reltn_cd,

=C2=A0 event_class_cd,

=C2=A0 event_cd,

=C2=A0 event_tag,

=C2=A0 CAST(event_end_dt_tm_os as BIGINT) as event_e= nd_dt_tm_os,

=C2=A0 result_val,

=C2=A0 result_units_cd,

=C2=A0 result_time_units_cd,

=C2=A0 task_assay_cd,

=C2=A0 record_status_cd,

=C2=A0 result_status_cd,

=C2=A0 CAST(authentic_flag as STRING) authentic_flag= ,

=C2=A0 CAST(publish_flag as STRING) publish_flag,

=C2=A0 qc_review_cd,

=C2=A0 normalcy_cd,

=C2=A0 normalcy_method_cd,

=C2=A0 inquire_security_cd,

=C2=A0 resource_group_cd,

=C2=A0 resource_cd,

=C2=A0 CAST(subtable_bit_map as STRING) subtable_bit= _map,

=C2=A0 collating_seq,

=C2=A0 verified_prsnl_id,

=C2=A0 performed_prsnl_id,

=C2=A0 updt_id,

=C2=A0 CAST(updt_task as STRING) updt_task,

=C2=A0 updt_cnt,

=C2=A0 CAST(updt_applctx as STRING) updt_applctx,

=C2=A0 normal_low,

=C2=A0 normal_high,

=C2=A0 critical_low,

=C2=A0 critical_high,

=C2=A0 CAST(event_tag_set_flag as STRING) event_tag_= set_flag,

=C2=A0 CAST(note_importance_bit_map as STRING) note_= importance_bit_map,

=C2=A0 CAST(order_action_sequence as STRING) order_a= ction_sequence,

=C2=A0 entry_mode_cd,

=C2=A0 source_cd,

=C2=A0 clinical_seq,

=C2=A0 CAST(event_end_tz as STRING) event_end_tz,

=C2=A0 CAST(event_start_tz as STRING) event_start_tz= ,

=C2=A0 CAST(performed_tz as STRING) performed_tz,

=C2=A0 CAST(verified_tz as STRING) verified_tz,

=C2=A0 task_assay_version_nbr,

=C2=A0 modifier_long_text_id,

=C2=A0 ce_dynamic_label_id,

=C2=A0 CAST(nomen_string_flag as STRING) nomen_strin= g_flag,

=C2=A0 src_event_id,

=C2=A0 CAST(last_utc_ts as BIGINT) last_utc_ts,

=C2=A0 device_free_txt,

=C2=A0 CAST(trait_bit_map as STRING) trait_bit_map,<= span>

=C2=A0 CAST(clu_subkey1_flag as STRING) clu_subkey1_= flag,

=C2=A0 CAST(clinsig_updt_dt_tm as BIGINT) clinsig_up= dt_dt_tm,

=C2=A0 CAST(event_end_dt_tm as BIGINT) event_end_dt_= tm,

=C2=A0 CAST(event_start_dt_tm as BIGINT) event_start= _dt_tm,

=C2=A0 CAST(expiration_dt_tm as BIGINT) expiration_d= t_tm,

=C2=A0 CAST(verified_dt_tm as BIGINT) verified_dt_tm= ,

=C2=A0 CAST(src_clinsig_updt_dt_tm as BIGINT) src_cl= insig_updt_dt_tm,

=C2=A0 CAST(updt_dt_tm as BIGINT) updt_dt_tm,<= /span>

=C2=A0 CAST(valid_from_dt_tm as BIGINT) valid_from_d= t_tm,

=C2=A0 CAST(valid_until_dt_tm as BIGINT) valid_until= _dt_tm,

=C2=A0 CAST(performed_dt_tm as BIGINT) performed_dt_= tm,

=C2=A0 txn_id_text,

=C2=A0 CAST(ingest_dt_tm as BIGINT) ingest_dt_tm

FROM v500.clinical_event


--001a11407ca049c2080565d1b1bf--