hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <>
Subject [jira] [Commented] (HIVE-6389) LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in null-maps.
Date Mon, 03 Mar 2014 02:37:21 GMT


Mithun Radhakrishnan commented on HIVE-6389:

Hey, Ashutosh. As a matter of fact, this stack trace is the result of running "select mymap[
'xyz' ] from mytable", if mytable has null values for mymap. Although the bug is in the LazyBinaryObjectInspector
for Maps, it doesn't manifest at the time of read.
The reason you're seeing a fail in LazySimpleSerde is because the results of the query are
being serialized into String (i.e. to console).

The LazyBinaryMapOI returns -1 for NULL maps. WHen the LazySimpleSerde attempts to convert
this Integer into Text, we get this bad-cast exception. The OI should have been returning
nulls for null objects, like the ColumnarSerDe does.

The way I tested this is:
1. create table mytable_text( mymap map<string, string> ) stored as textfile;
2. echo "\N\n\N\n\N" > /tmp/mytable.txt && hdfs dfs -copyFromLocal /tmp/mytable.txt
3. create table mytable_rcfile( mymap map<string, string> ) stored as rcfile; -- LazyBinaryColumnarSerDe
4. insert overwrite table mytable_rcfile select mymap from mytable_text;
5. select mymap['blah'] from mytable_rcfile;

Steps 1-4 is simply to insert a null-map into an RCFile-based table.
Step 5 causes the null-map to be returned by LazyBinaryMapOI as '-1', etc.

This patch brings LazyBinaryMapOI's behaviour in line with LazyMapOI. (This is likely just
a copy-paste error, from getMapSize().

> LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in null-maps.
> ----------------------------------------------------------------------------------------
>                 Key: HIVE-6389
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: Hive-6389.patch
> RCFile tables that use the LazyBinaryColumnarSerDe don't seem to handle look-ups into
map-columns when the value of the column is null.
> When an RCFile table is created with LazyBinaryColumnarSerDe (as is default in 0.12),
and queried as follows:
> {code}
> select mymap['1024'] from mytable;
> {code}
> and if the mymap column has nulls, then one is treated to the following guttural utterance:
> {code}
> 2014-02-05 21:50:25,050 FATAL mr.ExecMapper ( - org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"id":null,"mymap":null,"isnull":null}
>   at org.apache.hadoop.hive.ql.exec.MapOperator.process(
>   at
>   at
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(
>   at
>   at org.apache.hadoop.mapred.LocalJobRunner$Job$
>   at java.util.concurrent.Executors$
>   at
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(
>   at java.util.concurrent.ThreadPoolExecutor$
>   at
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to
>   at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(
>   at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(
>   at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(
>   at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(
>   at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(
>   at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(
>   at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(
>   at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(
>   at org.apache.hadoop.hive.ql.exec.MapOperator.process(
>   ... 10 more
> {code}
> A patch is on the way, but the short of it is that the LazyBinaryMapOI needs to return
nulls if either the map or the lookup-key is null.
> This is handled correctly for Text data, and for RCFiles using ColumnarSerDe.

This message was sent by Atlassian JIRA

View raw message