impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Subject Re: Difference between LOAD DATA and refresh
Date Mon, 02 Apr 2018 22:28:53 GMT
Hi Antoni,

I apologize for the extremely delayed response. LOAD DATA will still
require the catalog to update the metadata of that table, hence making it
susceptible to IMPALA-5058 if that operation is taking a long time. How
long does it usually take to refresh a partition? That said, IMPALA-5058 is
fixed in 5.15. So, you may want to consider upgrading your system if that's
possible.

Dimitris

On Mon, Jan 8, 2018 at 8:47 AM, Antoni Ivanov <aivanov@vmware.com> wrote:

> Hi,
>
>
>
> We are wondering if we can reduce the impact of https://issues.apache.org/
> jira/browse/IMPALA-5058
>
> Now we use “insert statements using spark” and then we use refresh
> partition x
>
> Now we are thinking of using directly  LOAD DATA statement.
>
>
>
> I imagine LOAD DATA doesn’t require to communicate with hive metastore db
> (only update hdfs block location).
>
>
>
> ?
>
> Thanks,
>
> Antoni
>

Mime
View raw message