impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chao chu <chuchao...@gmail.com>
Subject Parallel requests to Impala Catalogd
Date Thu, 21 Jul 2016 03:56:35 GMT
Hi,

Some background info first: we have 30+ tables (with parquet format), and
got 20000+ partitions so far, currently, a single 'refresh table' or
'compute incremental stats' runs more than 20s, this significantly slows
down our data processing pipeline.

We believe we have hit Impala-1480
<https://issues.cloudera.org/browse/IMPALA-1480>, while we are waiting for
upgrading to CDH 5.7 with Impala 2.5.

Could we parallel our DDL/DML operations (i.e., parallel the different
operations per table) to improve our current situation? Is that something
worth trying? thanks in advance!

-- 
ChuChao

Mime
View raw message