impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chao chu <>
Subject Parallel requests to Impala Catalogd
Date Thu, 21 Jul 2016 03:56:35 GMT

Some background info first: we have 30+ tables (with parquet format), and
got 20000+ partitions so far, currently, a single 'refresh table' or
'compute incremental stats' runs more than 20s, this significantly slows
down our data processing pipeline.

We believe we have hit Impala-1480
<>, while we are waiting for
upgrading to CDH 5.7 with Impala 2.5.

Could we parallel our DDL/DML operations (i.e., parallel the different
operations per table) to improve our current situation? Is that something
worth trying? thanks in advance!


View raw message