hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nishant Kelkar <nishant....@gmail.com>
Subject Re: Collect_set() of non-primitive types
Date Wed, 03 Sep 2014 21:09:16 GMT
Geez, too many mistakes for the day :P

I meant the following above

*CREATE TABLE* rollup_new *AS*
SELECT id, start_time, collect_set(concat_ws(",", object.name
<http://objects.name/>, object.value, object.type)) AS product_details
FROM bar
GROUP BY id, start_time;

The change is in the table creation query, since an INSERT OVERWRITE would
expect a struct and get a string instead, thus throwing errors right?



On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar <nishant.k02@gmail.com>
wrote:

> Sorry, I meant the following in my example:
>
> INSERT OVERWRITE TABLE rollup
> SELECT id, start_time, collect_set(concat_ws(",", object.name <http://objects.name/>,
object.value, object.type)) AS product_details
> FROM bar
> GROUP BY id, start_time;
>
> "object" instead of "object*s*", as you have in your example :)
>
> Best Regards,
> Nishant Kelkar
>
>
> On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar <nishant.k02@gmail.com>
> wrote:
>
>> I don't know of anything like what you want atleast until Hive 0.11.
>>
>> However, you could try something like this:
>>
>> INSERT OVERWRITE TABLE rollup
>> SELECT id, start_time, collect_set(concat_ws(",", objects.name, objects.value, objects.type))
AS product_details
>> FROM bar
>> GROUP BY id, start_time;
>>
>>
>> It's a bit hacky, but it does the trick. It basically concats all the
>> objects you care about in a string, and then collects a set of that. Then,
>> when you want to extract a field from the "product_details" array, just do
>> this (say you want to extract the very first product's name in each array):
>>
>> SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
>>
>>
>> Hope that helps!
>>
>> Best Regards,
>> Nishant Kelkar
>>
>>
>>
>> On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina <anusha.mangina@gmail.com>
>> wrote:
>>
>>> I have a table defined as:
>>>
>>> CREATE TABLE foo (
>>>   id INT,
>>>   start_time STRING,
>>>   name STRING,
>>>   value STRING,
>>>   type STRING
>>> )
>>>
>>> The 'name', 'value' and 'type' fields actually describe another object, and
>>> I'd like to turn these into STRUCTs, something like:
>>>
>>> CREATE TABLE bar (
>>>   id INT,
>>>   start_time STRING,
>>>   object STRUCT<name: STRING,
>>>     value: STRING,
>>>     type: STRING>
>>> )
>>>
>>> However, I'd also like to create a rollup table containing the most recent
>>> result for a given 'id' field. There can be multiple different 'object'
>>> structs for a given combination of 'id' and 'start_time', so I thought I'd
>>> turn this into an array of structs. My ideal rollup table would look like:
>>>
>>> CREATE TABLE rollup (
>>>   id INT,
>>>   start_time STRING,
>>>   objects ARRAY<STRUCT<name: STRING,
>>>     value: STRING,
>>>     type: STRING>>
>>> )
>>>
>>> However, I can't do this because the following query fails:
>>>
>>> INSERT OVERWRITE TABLE rollup
>>> SELECT id, start_time, collect_set(object)
>>> FROM bar
>>> GROUP BY id, start_time
>>>
>>> Here's the error I get:
>>>
>>> FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
>>> but struct<name:string,value:string,type:string> was passed as parameter
1.
>>>
>>> Is there any way I can do this?
>>>
>>>
>>
>

Mime
View raw message