hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Dolinár <dolik....@gmail.com>
Subject Re: UDTF fails when used in LATERAL VIEW
Date Fri, 22 Jun 2012 05:59:03 GMT
Hi Mark,

Thanks for suggestion, it is not that naïve :) I tried a lot of things
and combinations, including Text and even LazyString (as I was getting
exceptions about converting String to LazyString at one moment...).

But I guess what I missed was correct setting of field object
inspectors in initialize(). Only today I found out the correct way to
do this is using WritableStringObjectInspector:

    fieldNames.add("section");
    fieldOIs.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);

I couldn't find it before, since I was looking for
TextObjectInspector, which obviously doesn't exist - silly me :)
Anyway, it doesn't fail this way, but things get even weirder.

The simple queries over table without my SerDe and InputFormat, as
well as the SELECT my_func() ... work well, but the LATERAL VIEW query
now returns 0 lines.

At the end of a task log there is following:

2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 forwarded 1294158 rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1294158
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:64918
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 finished. closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 forwarded 2654579 rows
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 finished.
closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 forwarded 0 rows
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 Close done

So it looks like UDTF returns something but it dissapears in
FileSinkOperator. Or is this because the query was executed from hive
cli, so it is not writen to file, but streamed directly?

Also, I would like to ask what is the correct way to set the Text
value before forwarding. I've tried the following three ways:


        PrimitiveObjectInspectorFactory.writableStringObjectInspector.getPrimitiveWritableObject(forwardListObj[0]).set(output);

        ((Text)forwardListObj[0]).set(output);

        forwardListObj[0] = new Text(output);

All of them seem to work exactly the same. I know that the third could
cause performance problems, but I'm not sure which of the first two is
preferred.

Thank again for your assistance,

Jan

On 6/22/12, Mark Grover <mgrover@oanda.com> wrote:
> Hi Jan,
> Here's my first naïve question:-)
>
> Have you tried returning a Text value instead of String? Atleast in the case
> of UDFs, returning Text instead of Strings is possible and recommended too.
> I would think it would be the same case with UDTFs.
>
> Mark
>
> ----- Original Message -----
> From: "Jan Dolinár" <dolik.rce@gmail.com>
> To: "user" <user@hive.apache.org>
> Sent: Thursday, June 21, 2012 8:02:20 AM
> Subject: UDTF fails when used in LATERAL VIEW
>
> Hi,
>
> I've hit problems when writing custom UDTF that should return string
> values. I couldn't find anywhere what type should have the values that
> get forward()ed to collector. The only info I could dig out from
> google was few blogs with examples and 4 UDTFs that are among the hive
> sources. From that I figured out, that it should be OK to simply pass
> Strings inside the forwarded Object[] array. Here are the relevant
> parts of my code:
>
>       private Object[] forwardListObj;
>
>       @Override
>       public StructObjectInspector initialize(ObjectInspector[] args)
> throws UDFArgumentException {
>
>         // snipped irrelevant code
>
>         forwardListObj = new Object[1];
>         forwardListObj[0] = new String();
>
>         ArrayList<String> fieldNames = new ArrayList<String>(1);
>         ArrayList<ObjectInspector> fieldOIs = new
> ArrayList<ObjectInspector>(1);
>
>         fieldNames.add("section");
>
> fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
>
>         return
> ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
> fieldOIs);
>       }
>
> In proces() there is simple forwarding of some String:
>
>       forwardListObj[0] = "";
>       forward(forwardListObj);
>       // OR
>       String s = ...
>       forwardListObj[0] = s;
>       forward(forwardListObj);
>
>
> I was testing the function with a simple query
>
> SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);
>
> and it worked just as intended. But at the moment I got from testing
> to actually using the function in more complex queries, I got into
> trouble. Even LATERAL VIEW statement can cause failures:
>
> SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);
>
> causes tasks to fail with exception
>
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.hadoop.io.Text
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
> 	at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
> 	at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
> 	at
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at
> org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at
> org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
> 	at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
> 	at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
> 	...
>
> I should also mention that I use custom SerDe and InputFormat for the
> 'logs' table. When I was trying to figure it out, I was trying to run
> the same queries as listed above on different table without the
> customizations and it worked correctly too. So I think the SerDe
> and/or InputFormat probably play some role in this as well. What I
> don't understand is why the problem exhibits itself only with LATERAL
> VIEW. Any ideas anyone? Also, is it really correct to send String in
> forward()?
>
> Best regards,
> Jan
>

Mime
View raw message