hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3253) ArrayIndexOutOfBounds exception for deeply nested structs
Date Wed, 12 Jun 2013 22:04:20 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated HIVE-3253:
--------------------------------

    Attachment: HIVE-3253.2.patch

HIVE-3253.2.patch 
- It increases the number of control charactors used by LazySimpleSerde, avoiding the chars
that are likely to be present in data. Using new control chars is not backward compatible
change, so you need to set the serde property hive.serialization.extend.nesting.levels to
enable it for a table that is using LazySimpleSerde. If your input table has data that might
contain these delimiter control chars, you should escape the delimiter chars, and set escape
char using serde property.

Example :
{code}
create table nestedcomplex (
simple_int int,
max_nested_array  array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<array<int>>>>>>>>>>>>>>>>>>>>>>>)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (  'hive.serialization.extend.nesting.levels'='true'
)
;
{code}

- LazySimpleSerde is used by FileSyncOperator, that is why it was limited by the number of
levels of nesting supported by the serde. We should look at using LazyBinarySerde here as
it would be more efficient and can go beyond this nesting level restriction.

- LazySimpleSerde used in FileSyncOperator has escaping enabled, so it is safe to extend the
levels of nesting using the new serde property for that use case.

- The patch has fix to give better error message when the levels of nesting exceeds maximum
supported levels (not an ArrayIndexOutOfBounds exception anymore)
                
> ArrayIndexOutOfBounds exception for deeply nested structs
> ---------------------------------------------------------
>
>                 Key: HIVE-3253
>                 URL: https://issues.apache.org/jira/browse/HIVE-3253
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0, 0.10.0
>            Reporter: Swarnim Kulkarni
>            Assignee: Travis Crawford
>         Attachments: HIVE-3253.2.patch, HIVE-3253_moar_nesting.1.patch, jsonout.hive
>
>
> It was observed that while creating table with deeply nested structs might throw this
exception:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 9
>         at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
> 	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
> {code}
> The reason being that currently the separators array has been hardcoded to be of size
8 in the LazySimpleSerde.
> {code}
> // Read the separators: We use 8 levels of separators by default, but we
> // should change this when we allow users to specify more than 10 levels
> // of separators through DDL.
> serdeParams.separators = new byte[8];
> {code}
> If possible, we should increase this size or at least make it configurable to properly
handle deeply nested structs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message