kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From farkas@tf-bic.sk <far...@tf-bic.sk>
Subject Re: poor performance on insert into range partitions and scaling
Date Thu, 02 Aug 2018 11:12:03 GMT
Found the reason from profiles. It is again about the exchange. Noshuffle helped a lot. Because
when you do create table parq as select * from kudu180M it scans kudu, writes directly to
HDFS. When you do insert into parq partition (year) select * from kudu180M where partition=2018
then it just reads 45M rows, but the exchange hashes the rows, so it is slower.

On 2018/07/31 20:59:28, Mike Percy <mpercy@apache.org> wrote: 
> Can you post a query profile from Impala for one of the slow insert jobs?
> 
> Mike
> 
> On Tue, Jul 31, 2018 at 12:56 PM Tomas Farkas <farkas@tf-bic.sk> wrote:
> 
> > Hi,
> > wanted share with you the preliminary results of my Kudu testing on AWS
> > Created a set of performance tests for evaluation of different instance
> > types in AWS and different configurations (Kudu separated from Impala, Kudu
> > and Impala on the same nodes); different drive (st1 and gp2) settings and
> > here my results:
> >
> > I was quite dissapointed by the inserts in Step3 see attached sqls,
> >
> > Any hints, ideas, why this does not scale?
> > Thanks
> >
> >
> >
> 

Mime
View raw message