hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: Hive 0.13.0 - IndexOutOfBounds Exception
Date Tue, 22 Apr 2014 18:41:21 GMT
Bryan,

This issue is related to https://issues.apache.org/jira/browse/HIVE-6883

The workaround for this issue is to disable hive.optimize.sort.dynamic.partition optimization
by setting it to false.

We found this issue very late (towards the end of 0.13 release) and so wasn’t included in
hive 0.13. It will go into the next patch release/next release. I will request for a backport
to hive 0.13 source as well. 

Thanks
Prasanth Jayachandran

On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey <bryan.jeffrey@gmail.com> wrote:

> Prasanth,
> 
> Was this additional information sufficient?  This is a large road block to our adopting
Hive 0.13.0.
> 
> Regards,
> 
> Bryan Jeffrey
> 
> 
> On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey <bryan.jeffrey@gmail.com> wrote:
> Prasanth,
> 
> The error seems to occur with just about any table.  I mocked up a very simple table
to illustrate the problem (including input data, etc.) to make this easy to repeat.
> 
> hive> create table loading_data_0 (A smallint, B smallint) partitioned by (range int)
row format delimited fields terminated by '|' stored as textfile;
> hive> create table data (A smallint, B smallint) partitioned by (range int) clustered
by (A) sorted by (A, B) into 8 buckets stored as orc tblproperties (\"orc.compress\" = \"SNAPPY\",
\"orc.index\" = \"true\");
> [root@server ~]# cat test.input
> 123|436
> 423|426
> 223|456
> 923|486
> 023|406
> hive> load data inpath '/test.input' into table loading_data_0 partition (range=123);
> 
> [root@server scripts]# hive -e "describe data;"
> Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j
> OK
> Time taken: 0.508 seconds
> OK
> a                       smallint
> b                       smallint
> range                   int
> 
> # Partition Information
> # col_name              data_type               comment
> 
> range                   int
> Time taken: 0.422 seconds, Fetched: 8 row(s)
> [root@server scripts]# hive -e "describe loading_data_0;"
> Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j
> OK
> Time taken: 0.511 seconds
> OK
> a                       smallint
> b                       smallint
> range                   int
> 
> # Partition Information
> # col_name              data_type               comment
> 
> range                   int
> Time taken: 0.37 seconds, Fetched: 8 row(s)
> 
> 
> [root@server scripts]# hive -e "set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting
= true; set mapred.job.queue.name=orc_queue; explain insert into table data partition (range)
select * from loading_data_0;"
> Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j
> OK
> Time taken: 0.564 seconds
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> 
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: loading_data_0
>             Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats:
NONE
>             Select Operator
>               expressions: a (type: smallint), b (type: smallint), range (type: int)
>               outputColumnNames: _col0, _col1, _col2
>               Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats:
NONE
>               Reduce Output Operator
>                 key expressions: _col2 (type: int), -1 (type: int), _col0 (type: smallint),
_col1 (type: smallint)
>                 sort order: ++++
>                 Map-reduce partition columns: _col2 (type: int)
>                 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats:
NONE
>                 value expressions: _col0 (type: smallint), _col1 (type: smallint), _col2
(type: int)
>       Reduce Operator Tree:
>         Extract
>           Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE
>           File Output Operator
>             compressed: false
>             Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats:
NONE
>             table:
>                 input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                 serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                 name: data
> 
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             range
>           replace: false
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: data
> 
> Time taken: 0.913 seconds, Fetched: 45 row(s)
> 
> 
> 
>  [root@server]# hive -e "set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting
= true; set mapred.job.queue.name=orc_queue; insert into table data partition (range) select
* from loading_data_0;"
> Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j
> OK
> Time taken: 0.513 seconds
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=<number>
> Starting Job = job_1398130933303_1467, Tracking URL = http://server:8088/proxy/application_1398130933303_1467/
> Kill Command = /opt/hadoop/latest-hadoop/bin/hadoop job  -kill job_1398130933303_1467
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
> 2014-04-22 11:33:26,984 Stage-1 map = 0%,  reduce = 0%
> 2014-04-22 11:33:51,833 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_1398130933303_1467 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1398130933303_1467_m_000000 (and more) from job job_1398130933303_1467
> 
> Task with the most failures(4):
> -----
> Task ID:
>   task_1398130933303_1467_m_000000
> 
> URL:
>   http://server:8088/taskdetails.jsp?jobid=job_1398130933303_1467&tipid=task_1398130933303_1467_m_000000
> -----
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"a":123,"b":436,"range":123}
>         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"a":123,"b":436,"range":123}
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>         ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException:
Index: 3, Size: 3
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>         ... 9 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>         at java.util.ArrayList.get(ArrayList.java:322)
>         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:121)
>         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109)
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:283)
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268)
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:251)
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:264)
>         ... 15 more
> 
> Container killed by the ApplicationMaster.
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> 
> 
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched:
> Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 0 msec
> 
> Does that help?  I took a quick look at ReduceSinkOperator, but was unable to put my
finger on the issue.
> 
> Regards,
> 
> Bryan Jeffrey
> 
> 
> 
> On Mon, Apr 21, 2014 at 10:55 PM, Prasanth Jayachandran <pjayachandran@hortonworks.com>
wrote:
> Hi Bryan
> 
> Can you provide more information about the input and output tables? Schema? Partitioning
and bucketing information? Explain plan of your insert query?
> 
> These information will help to diagnose the issue.
> 
> Thanks
> Prasanth
> 
> Sent from my iPhone
> 
> > On Apr 21, 2014, at 7:00 PM, Bryan Jeffrey <bryan.jeffrey@gmail.com> wrote:
> >
> > Hello.
> >
> > I am running Hadoop 2.4.0 and Hive 0.13.0.  I am encountering the following error
when converting a text table to ORC via the following command:
> >
> > Error:
> >
> > Diagnostic Messages for this Task:
> > Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row { - Removed -}
> >         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
> >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> >         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row { - Removed -}
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
> >         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
> >         ... 8 more
> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IndexOutOfBoundsException:
Index: 3, Size: 3
> >         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
> >         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> >         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
> >         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> >         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
> >         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
> >         ... 9 more
> > Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
> >         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> >         at java.util.ArrayList.get(ArrayList.java:322)
> >         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:121)
> >         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109)
> >         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:283)
> >         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268)
> >         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:251)
> >         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:264)
> >         ... 15 more
> >
> > Container killed by the ApplicationMaster.
> > Container killed on request. Exit code is 143
> > Container exited with a non-zero exit code 143
> >
> > There are a number of older issues associated with IndexOutOfBounds errors within
the serde, but nothing that appears to specifically match this error.  This occurs with all
tables (including those consisting of exclusively integers).  Any thoughts?
> >
> > Regards,
> >
> > Bryan Jeffrey
> 
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message