kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Serbin <aser...@cloudera.com>
Subject Re: is this mean the disk read rate was too slow
Date Mon, 15 Jul 2019 17:07:26 GMT
Hi,

What was the expectation for the scan operation's timing w.r.t. the size of
the result set?  Did you see it was much faster in past?  I would start
with making sure the primary key of the table has indeed the columns used
in the predicate.  Also, if there has been 'trickle inserts' running
against the table for a long time, it might be
https://issues.apache.org/jira/browse/KUDU-1400


Probably, a good starting point would be running SUMMARY and EXPLAIN for
the query in impala-shell:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_explain_plan.html#perf_profile

If you see times for SCAN KUDU is much higher than you expect, most likely
it's either too much data being read or KUDU-1400.  Also, check the logs of
tserver at one of the machines where the scan is running: if case of slow
scan operations there should be traces for scan operations, search for 'Created
scanner ' pattern or the UUID of the scanner in the logs.


Kind regards,

Alexey


On Mon, Jul 15, 2019 at 2:45 AM lk_hadoop <lk_hadoop@163.com> wrote:

> hi,all:
>        My impala+kudu cluster suddenly become slow , I doubt  about the
> disk not work well, I saw some scan infomation from kudu's web :
> xxx:8050/scans
>
>
> 7a2bb2a7e62d4614b423f26c3117b49e
> <http://realtimeanalysis-kudu-04-10-8-50-58:8050/tablet?id=7a2bb2a7e62d4614b423f26c3117b49e>
> 1934b20b8ab34ab98f1deb43c3eba4b2 Complete
>
> *SELECT* membership_card_id,
>        tbill_code,
>        goods_id,
>        goods_name,
>        paid_in_amt,
>        profit,
>        dates
>   *FROM* impala::TEST.SALE_BASE_FACT_WITH_MEMBERSHIP_20190626
>  *WHERE* PRIMARY KEY >= <redacted>
>    *AND* PRIMARY KEY < <redacted>
>
> {username='hive'} at 10.8.50.58:46682 19.3 s 37 min
> column cells read bytes read blocks read
> membership_card_id 381.10k 3.85M 18
> tbill_code 372.82k 5.99M 27
> goods_id 426.24k 281.3K 8
> dates 426.24k 8.5K 8
> business_id 426.24k 2.7K 8
> goods_name 426.24k 291.1K 8
> paid_in_amt 376.93k 1019.6K 24
> profit 376.93k 1.14M 24
> total 3.21M 12.55M 125    is this mean the disk read rate was too slow ?
>
>
> 2019-07-15
> ------------------------------
> lk_hadoop
>

Mime
View raw message