impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: REFRESH partitions
Date Mon, 19 Mar 2018 19:34:36 GMT
Don't use the -r option to impala-shell! That option was a mistake and it's
removed in impala 3.0. The problem is that it does a global invalidate
which is expensive because it requires reloading all metadata.

On 19 Mar. 2018 10:35, "Juan" <anyion@gmail.com> wrote:

> If the table is partitioned by year, month, day, but not hour, running
> recover partitions is not a good idea.
> Recover partitions only load metadata when it discovers a new partition,
> for existing partitions, even if there is new data, recover partitions will
> ignore them. so the table metadata could be out-of-date and queries will
> return wrong result.
>
> If the spark job is not running very frequently, you can run refresh table
> to refresh a specific partition after job completion. or running it once
> per hour.
>
> REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])]
>
>
> On Sat, Mar 17, 2018 at 1:10 AM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
>
>> Hello Guys,
>>
>> I have a parquet files that a Spark job generates, i'm defining an
>> external table on these parquet files which portioned by year.month and
>> day, The Spark job feeds these tables each hour.
>>
>> I have a cron job that running  each one hour and run the command:
>>
>>  alter table $(table_name) recover partitions
>>
>> I'm looking for other solutions if there is by impala, like
>> configuration, for example i'm thinking if i need to educate the end users
>> to use -r option to refresh the table.
>>
>>
>> Is there any other solutions for recover partitions?
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message