hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan DolinĂ¡r <dolik....@gmail.com>
Subject UDTF fails when used in LATERAL VIEW
Date Thu, 21 Jun 2012 12:02:20 GMT
Hi,

I've hit problems when writing custom UDTF that should return string
values. I couldn't find anywhere what type should have the values that
get forward()ed to collector. The only info I could dig out from
google was few blogs with examples and 4 UDTFs that are among the hive
sources. From that I figured out, that it should be OK to simply pass
Strings inside the forwarded Object[] array. Here are the relevant
parts of my code:

      private Object[] forwardListObj;

      @Override
      public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {

        // snipped irrelevant code

        forwardListObj = new Object[1];
        forwardListObj[0] = new String();

        ArrayList<String> fieldNames = new ArrayList<String>(1);
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(1);

        fieldNames.add("section");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
      }

In proces() there is simple forwarding of some String:

      forwardListObj[0] = "";
      forward(forwardListObj);
      // OR
      String s = ...
      forwardListObj[0] = s;
      forward(forwardListObj);


I was testing the function with a simple query

SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);

and it worked just as intended. But at the moment I got from testing
to actually using the function in more complex queries, I got into
trouble. Even LATERAL VIEW statement can cause failures:

SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);

causes tasks to fail with exception

java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
	at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
	at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
	at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
	at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
	...

I should also mention that I use custom SerDe and InputFormat for the
'logs' table. When I was trying to figure it out, I was trying to run
the same queries as listed above on different table without the
customizations and it worked correctly too. So I think the SerDe
and/or InputFormat probably play some role in this as well. What I
don't understand is why the problem exhibits itself only with LATERAL
VIEW. Any ideas anyone? Also, is it really correct to send String in
forward()?

Best regards,
Jan

Mime
View raw message