impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan <any...@gmail.com>
Subject Re: Parallel requests to Impala Catalogd
Date Thu, 28 Jul 2016 03:31:53 GMT
Do you keep adding new partitions?
Refresh table could take long time for table has lots of partitions. but it
should be a one time cost. after that, the table metadata will be cached,
the following DML queries should not suffer the slowness anymore.

On Wed, Jul 20, 2016 at 8:56 PM, chao chu <chuchao333@gmail.com> wrote:

> Hi,
>
> Some background info first: we have 30+ tables (with parquet format), and
> got 20000+ partitions so far, currently, a single 'refresh table' or
> 'compute incremental stats' runs more than 20s, this significantly slows
> down our data processing pipeline.
>
> We believe we have hit Impala-1480
> <https://issues.cloudera.org/browse/IMPALA-1480>, while we are waiting
> for upgrading to CDH 5.7 with Impala 2.5.
>
> Could we parallel our DDL/DML operations (i.e., parallel the different
> operations per table) to improve our current situation? Is that something
> worth trying? thanks in advance!
>
> --
> ChuChao
>

Mime
View raw message