hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj Hadoop <hadoop...@yahoo.com>
Subject Re: GenericUDF Testing in Hive
Date Tue, 04 Feb 2014 19:32:11 GMT
How to test a Hive GenericUDF which accepts two parameters List<T>, T 

List<T> -> Can it be the output of a collect set. Please advise.

I have a generic udf which takes List<T>, T. I want to test it how it works through
Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop <hadoopraj@yahoo.com> wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a Hive query.
Basically want to pass parameters some thing like "select ComplexUDFExample('a','b','c') from employees
limit 10".

------------------------------------------------------------------------------------------------------------------------------------------------
 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException
{
    if (arguments.length != 2) {
      throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments:
List<T>, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list / array, second argument
must be a
 string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list of strings");
    }
    
    // the return type of our function is a boolean, so we provide the correct object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
      return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
      if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive> select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch ''c'': arrayContainsExample
only takes 2 arguments: List<T>, T
 
------------------------------------------------------------------------------------------------------------------------------------------
 
How to test this example in Hive query. I know I am invoking it wrong. But how can I invoke
it correctly.
 
My requirement is to pass a String of arrays as first argument and another string as second
argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj
Mime
View raw message