hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject Re: how to create an array from two columns?
Date Sun, 13 Mar 2016 22:24:52 GMT
Very nice! Thank you, Chandeep.


On Sat, Mar 12, 2016 at 8:20 PM, Chandeep Singh <cs@chandeep.com> wrote:

> Not sure if there is a function for that but I wrote a UDF to do so -
> https://github.com/chandeepsingh/Hive-UDFs
>
> hive> ADD JAR hive-udfs-1.0-uber.jar;
> Added [hive-udfs-1.0-uber.jar] to class path
> Added resources: [hive-udfs-1.0-uber.jar]
>
> hive> CREATE TEMPORARY FUNCTION array_dedup AS
> 'com.hive.udfs.UdfArrayDeDup';
> OK
> Time taken: 0.015 seconds
>
> hive> *SELECT array_dedup(array("blah","blah","blah")) from table1 limit
> 1;*
> OK
> ["blah"]
> Time taken: 0.502 seconds, Fetched: 1 row(s)
>
> Here is the code:
>
> package com.hive.udfs;
>
> /**
>  *
>  * @author chandeepsingh
>  * Remove duplicates from an array
>  */
>
> import java.util.ArrayList;
> import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
> import org.apache.hadoop.hive.ql.metadata.HiveException;
> import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
> import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
> import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
> import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
>
> import java.util.HashSet;
> import java.util.List;
>
> public class UdfArrayDeDup extends GenericUDF {
>
>     ListObjectInspector arrayOI = null;
>
>     @Override
>     public ObjectInspector initialize(ObjectInspector[] arguments)
>             throws UDFArgumentException {
>
>         arrayOI = (ListObjectInspector) arguments[0];
>         return ObjectInspectorUtils.getStandardObjectInspector(arrayOI);
>     }
>
>     @Override
>     public Object evaluate(DeferredObject[] arguments) throws
> HiveException {
>
>         List<?> myArr = (List<?>)
> ObjectInspectorUtils.copyToStandardObject(arguments[0].get(), arrayOI);
>         HashSet<Object> myHashSet = new HashSet<>();
>
>         myHashSet.addAll(myArr);
>
>         if (myHashSet != null) {
>             return new ArrayList<>(myHashSet);
>         } else {
>             return null;
>         }
>
>     }
>
>     @Override
>     public String getDisplayString(String[] input) {
>         return new String();
>     }
> }
>
>
>
> On Mar 13, 2016, at 1:30 AM, Rex X <dnsring@gmail.com> wrote:
>
> For the first question, is there any way to use "set" instead of an
> "array" to dedupe all elements?
>
> "select array(1,1)" will return "[1,1]", not "[1]".
>
>
>
> On Sat, Mar 12, 2016 at 5:26 PM, Rex X <dnsring@gmail.com> wrote:
>
>> Thank you, Chandeep. Yes, my first problem solved.
>> How about the second one? Is there any way to append an element to an
>> existing array?
>>
>>
>>
>> On Sat, Mar 12, 2016 at 5:10 PM, Chandeep Singh <cs@chandeep.com> wrote:
>>
>>> If you only want the array while you’re querying table1 your example
>>> should work. If you want to add AB to the table you’ll probably need to
>>> create a new table by selecting everything you need from table1.
>>>
>>> hive> select * from table1 limit 1;
>>> OK
>>> temp1 temp2 temp3
>>>
>>> hive> select f1, array(f2, f3) AS AB from table1 limit 1;
>>> OK
>>> temp1 [“temp2”,"temp3"]
>>>
>>>
>>> On Mar 13, 2016, at 12:33 AM, Rex X <dnsring@gmail.com> wrote:
>>>
>>> How to make the following work?
>>>
>>> 1. combine columns A and B to make one array as a new column AB. Both
>>> column A and B are string types.
>>>
>>>   select
>>> string_columnA,
>>> string_columnB,
>>> *array(string_columnA, string_columnB) *as AB
>>> from Table1;
>>>
>>> 2. append columnA to an existing array-type column B
>>>
>>> select
>>> string_columnA,
>>> array_columnB,
>>> array_flatmerge(string_columnA, array_columnB) as AB
>>> from Table2;
>>>
>>> In fact, I should say "set" instead of "array" above, since I expect no
>>> duplicates.
>>>
>>> Any idea?
>>>
>>>
>>>
>>
>
>

Mime
View raw message