hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adriaan Tijsseling <adri...@tijsseling.com>
Subject Re: Query using dynamic partitions
Date Sun, 18 Sep 2011 22:04:24 GMT
Thanks much for your help. Your earlier query worked well, but the latest query produces an
error about a mismatch:
Error in semantic analysis: Line 2:23 Cannot insert into target table because column number/types
are different brand: Table insclause-0 has 5 columns, but query has 3 columns.

I tried the redundant version:

FROM raw
insert overwrite table polished partition (partition1, partition2)
select TRANSFORM(raw.partition1, raw.partition2, raw.data)
USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,
STRING>, partition1 STRING, partition2 STRING)
CLUSTER BY date

But that produced a table with partitons set to __HIVE_DEFAULT_PARTITION__

This is turning to be a little bit harder than I expected ;)

Adriaan

On 2011/09/18, at 23:23, Sumanth V wrote:

> The earlier query was just to show the dynamic partitions concept.
> For your case, you will have to use a query something like the one below.
> Although I have not given it a try, in theory this should work -
> 
> FROM raw
> insert overwrite table polished partition (partition1, partition2)
> select TRANSFORM(raw.data)
> USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,
> STRING>)
> CLUSTER BY date
> 
> 
> Sumanth
> 
> 
> 
> On Sun, Sep 18, 2011 at 12:28 PM, Adriaan Tijsseling <adriaan@tijsseling.com
>> wrote:
> 
>> I looked at your solution, but the problem is still that the "data" column
>> needs to be processed still. What I want is to process "data" and put the
>> results into a table with partitioned defined by the other columns. With
>> your solution, I get partitions but still with the same unprocessed data.
>> 
>> Adriaan
>> 
>> On 2011/09/18, at 04:56, Sumanth V wrote:
>> 
>>> Hi Adriaan,
>>> 
>>> To use dynamic partition, follow the following steps inside hive shell -
>>> 
>>> #Set the following values -
>>> 
>>> set hive.exec.dynamic.partition.mode=nonstrict;
>>> 
>>> set hive.exec.dynamic.partition=true
>>> 
>>> #Create another table -
>>> 
>>> create table raw_2
>>> (
>>> data string
>>> )
>>> partitioned by (partition1 string, partition2 string);
>>> 
>>> #Now insert the values stored in table raw into table raw_2 using the
>>> following query -
>>> 
>>> from raw
>>> insert overwrite table raw_2 partition (partition1, partition2)
>>> select data, partition1, partition2;
>>> 
>>> This will dynamically create the 2 partitions based on the values of
>>> partition1 and partition2 and insert the values of 'data' in the
>> appropriate
>>> partition.
>>> 
>>> Regards,
>>> Sumanth
>>> 
>>> 
>>> 
>>> On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
>>> <adriaan@tijsseling.com>wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have a table created with
>>>> 
>>>> CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
>>>> FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;
>>>> 
>>>> I want to further process "data" and put it in a partition (partition1,
>>>> partition2) defined by the values in the relevant row.
>>>> 
>>>> I'm however stuck at trying to use dynamic partitions in a query. With
>>>> predefined partition values it's straightforward:
>>>> 
>>>> FROM (
>>>> FROM raw
>>>> SELECT TRANSFORM(raw.data)
>>>>       USING 'python parser.py' AS (foo STRING, date STRING, bar
>>>> MAP<STRING,STRING>)
>>>> CLUSTER BY date
>>>> ) tmap
>>>> INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
>>>> partition2='p2') SELECT foo, date, bar;
>>>> 
>>>> What would be the best way to define the partition using raw.partition1
>> and
>>>> raw.partition2 as values?
>>>> 
>>>> Thanks much,
>>>> 
>>>> Adriaan
>>>> 
>>>> 
>> 
>> 


Mime
View raw message