hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Change in serdeproperties does not update existing partitions
Date Wed, 14 Sep 2011 11:45:27 GMT
Hey Maxime,

Looks like there is some confusion here. You need not to recreate partition
any time you update something about the table. If you e.g. are adding new
columns, you can just do alter table add column.... and then alter table add
partition.. you need not to do anything about existing partition in those
cases and things will work fine. What I was suggesting was a workaround
because of lack of the functionality of changing serdeproperties of existing
partition. Ideally, it should be possible to do so, but currently that
feature is not there.

Hope it helps,
Ashutosh

On Tue, Sep 13, 2011 at 11:48, Maxime Brugidou <maxime.brugidou@gmail.com>wrote:

> Thanks Ashutosh for your answer. I actually use external tables so that i
> don't drop my partitions data.
>
> This is still an odd behavior to me and I don't get why someone would
> expect it. Whenever I need to add a column to a table (my table here
> represent a log, and it is common to add fields to logs), I need to drop all
> partitions and recreate them. How do people do in general?
>
> Do you have a use case where people want to alter a table and not update
> existing partitions? Is it so that if your file format evolves you don't
> have to convert the whole history?
>
> Best,
> Maxime
>
> On Tue, Sep 13, 2011 at 7:03 PM, Ashutosh Chauhan <hashutosh@apache.org>wrote:
>
>> Hey Maxime,
>>
>> Yeah, thats intended behavior. After you do alter on table, all subsequent
>> actions on table and partitions will inherit from it. If you want to modify
>> properties of already existing partitions, you should be able to do
>> something like 'alter table test_table partition (day='2011-09-02') set
>> serdeproperties ('input.regex' = '(.*)')' Unfortunately this is not
>> supported currently. Feel free to file a bug for that.
>>
>> A workaround (applicable only because you are using external table) is to
>> drop partition and then add them again. When you drop a partition from
>> external table, only metadata gets wiped out, data is not deleted, so when
>> you will add partition again, it will inherit from table serde properties
>> and you will get what you are looking for. Use this workaround with care,
>> you don't want to loose your data in recreating partitions.
>>
>> Hope it helps,
>> Ashutosh
>>
>> On Tue, Sep 13, 2011 at 06:03, Maxime Brugidou <maxime.brugidou@gmail.com
>> > wrote:
>>
>>> Hello,
>>>
>>> I am using Hive 0.7 from cloudera cdh3u0 and I encounter a strange
>>> behavior when I update the serdeproperties of a table (for example for the
>>> RegexSerDe).
>>>
>>> If you have a simple partitioned table like
>>>
>>> create external table test_table (
>>>     id int)
>>> partitioned by (day string)
>>> row format serde 'org.apache.hadoop.contrib.serde2.RegexSerDe'
>>> with serdeproperties (
>>>     'input.regex' = '.* ([^ ]*)'
>>> );
>>>
>>> alter table test_table add partition (day='2011-09-01');
>>>
>>> alter table test_table set serdeproperties  (
>>>     'input.regex' = '(.*)'
>>> );
>>>
>>> alter table test_table add partition (day='2011-09-02');
>>>
>>>
>>> The first partition will still use the older regex and the new one will
>>> use the new regex. Is this intended behavior? Why?
>>>
>>> Thanks for your help,
>>> Maxime
>>>
>>>
>>
>

Mime
View raw message