kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Impala Parquet to Kudu 1.5 - severe ingest performance degradation
Date Thu, 22 Feb 2018 14:31:20 GMT
Hello,

we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1 and
noticed quite severe performance degradation. We did CTAS from Impala
parquet table which has not changed a bit since the upgrade (even the same
# of rows) to Kudu using the follow query below.

It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.

Of course Impala version was also bumped with CDH 5.13.

Any clue why it takes so much time now?

Table has 5.5B rows..

create TABLE kudutest_ts.clinical_event_nots

PRIMARY KEY (clinical_event_id)

PARTITION BY HASH(clinical_event_id) PARTITIONS 120

STORED AS KUDU

AS

SELECT

  clinical_event_id,

  encntr_id,

  person_id,

  encntr_financial_id,

  event_id,

  event_title_text,

  CAST(view_level as string) as view_level,

  order_id,

  catalog_cd,

  series_ref_nbr,

  accession_nbr,

  contributor_system_cd,

  reference_nbr,

  parent_event_id,

  event_reltn_cd,

  event_class_cd,

  event_cd,

  event_tag,

  CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,

  result_val,

  result_units_cd,

  result_time_units_cd,

  task_assay_cd,

  record_status_cd,

  result_status_cd,

  CAST(authentic_flag as STRING) authentic_flag,

  CAST(publish_flag as STRING) publish_flag,

  qc_review_cd,

  normalcy_cd,

  normalcy_method_cd,

  inquire_security_cd,

  resource_group_cd,

  resource_cd,

  CAST(subtable_bit_map as STRING) subtable_bit_map,

  collating_seq,

  verified_prsnl_id,

  performed_prsnl_id,

  updt_id,

  CAST(updt_task as STRING) updt_task,

  updt_cnt,

  CAST(updt_applctx as STRING) updt_applctx,

  normal_low,

  normal_high,

  critical_low,

  critical_high,

  CAST(event_tag_set_flag as STRING) event_tag_set_flag,

  CAST(note_importance_bit_map as STRING) note_importance_bit_map,

  CAST(order_action_sequence as STRING) order_action_sequence,

  entry_mode_cd,

  source_cd,

  clinical_seq,

  CAST(event_end_tz as STRING) event_end_tz,

  CAST(event_start_tz as STRING) event_start_tz,

  CAST(performed_tz as STRING) performed_tz,

  CAST(verified_tz as STRING) verified_tz,

  task_assay_version_nbr,

  modifier_long_text_id,

  ce_dynamic_label_id,

  CAST(nomen_string_flag as STRING) nomen_string_flag,

  src_event_id,

  CAST(last_utc_ts as BIGINT) last_utc_ts,

  device_free_txt,

  CAST(trait_bit_map as STRING) trait_bit_map,

  CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,

  CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,

  CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,

  CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,

  CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,

  CAST(verified_dt_tm as BIGINT) verified_dt_tm,

  CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,

  CAST(updt_dt_tm as BIGINT) updt_dt_tm,

  CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,

  CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,

  CAST(performed_dt_tm as BIGINT) performed_dt_tm,

  txn_id_text,

  CAST(ingest_dt_tm as BIGINT) ingest_dt_tm

FROM v500.clinical_event

Mime
View raw message