hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Jeffrey <bryan.jeff...@gmail.com>
Subject Re: Hive 0.13.0 - IndexOutOfBounds Exception
Date Tue, 22 Apr 2014 21:36:40 GMT
Prasanth,

Thank you for the help.  It would not have occurred to me to look at
partition sort and order issues from that dump. I may just apply the patch
to my copy of 13.

Regards,

Bryan Jeffrey
On Apr 22, 2014 2:41 PM, "Prasanth Jayachandran" <
pjayachandran@hortonworks.com> wrote:

> Bryan,
>
> This issue is related to https://issues.apache.org/jira/browse/HIVE-6883
>
> The workaround for this issue is to disable
> hive.optimize.sort.dynamic.partition optimization by setting it to false.
>
> We found this issue very late (towards the end of 0.13 release) and so
> wasn’t included in hive 0.13. It will go into the next patch release/next
> release. I will request for a backport to hive 0.13 source as well.
>
> Thanks
> Prasanth Jayachandran
>
> On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey <bryan.jeffrey@gmail.com>
> wrote:
>
> Prasanth,
>
> Was this additional information sufficient?  This is a large road block to
> our adopting Hive 0.13.0.
>
> Regards,
>
> Bryan Jeffrey
>
>
> On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey <bryan.jeffrey@gmail.com>wrote:
>
>> Prasanth,
>>
>> The error seems to occur with just about any table.  I mocked up a very
>> simple table to illustrate the problem (including input data, etc.) to make
>> this easy to repeat.
>>
>> hive> create table loading_data_0 (A smallint, B smallint) partitioned by
>> (range int) row format delimited fields terminated by '|' stored as
>> textfile;
>> hive> create table data (A smallint, B smallint) partitioned by (range
>> int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc
>> tblproperties (\"orc.compress\" = \"SNAPPY\", \"orc.index\" = \"true\");
>> [root@server ~]# cat test.input
>> 123|436
>> 423|426
>> 223|456
>> 923|486
>> 023|406
>> hive> load data inpath '/test.input' into table loading_data_0 partition
>> (range=123);
>>
>> [root@server scripts]# hive -e "describe data;"
>> Logging initialized using configuration in
>> /opt/hadoop/latest-hive/conf/hive.log4j
>> OK
>> Time taken: 0.508 seconds
>> OK
>> a                       smallint
>> b                       smallint
>> range                   int
>>
>> # Partition Information
>> # col_name              data_type               comment
>>
>> range                   int
>> Time taken: 0.422 seconds, Fetched: 8 row(s)
>> [root@server scripts]# hive -e "describe loading_data_0;"
>> Logging initialized using configuration in
>> /opt/hadoop/latest-hive/conf/hive.log4j
>> OK
>> Time taken: 0.511 seconds
>> OK
>> a                       smallint
>> b                       smallint
>> range                   int
>>
>> # Partition Information
>> # col_name              data_type               comment
>>
>> range                   int
>> Time taken: 0.37 seconds, Fetched: 8 row(s)
>>
>>
>> [root@server scripts]# hive -e "set
>> hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting =
>> true; set mapred.job.queue.name=orc_queue; explain insert into table
>> data partition (range) select * from loading_data_0;"
>> Logging initialized using configuration in
>> /opt/hadoop/latest-hive/conf/hive.log4j
>> OK
>> Time taken: 0.564 seconds
>> OK
>> STAGE DEPENDENCIES:
>>   Stage-1 is a root stage
>>   Stage-0 depends on stages: Stage-1
>>
>> STAGE PLANS:
>>   Stage: Stage-1
>>     Map Reduce
>>       Map Operator Tree:
>>           TableScan
>>             alias: loading_data_0
>>             Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE
>> Column stats: NONE
>>             Select Operator
>>               expressions: a (type: smallint), b (type: smallint), range
>> (type: int)
>>               outputColumnNames: _col0, _col1, _col2
>>               Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE
>> Column stats: NONE
>>               Reduce Output Operator
>>                 key expressions: _col2 (type: int), -1 (type: int), _col0
>> (type: smallint), _col1 (type: smallint)
>>                 sort order: ++++
>>                 Map-reduce partition columns: _col2 (type: int)
>>                 Statistics: Num rows: 5 Data size: 40 Basic stats:
>> COMPLETE Column stats: NONE
>>                 value expressions: _col0 (type: smallint), _col1 (type:
>> smallint), _col2 (type: int)
>>       Reduce Operator Tree:
>>         Extract
>>           Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE
>> Column stats: NONE
>>           File Output Operator
>>             compressed: false
>>             Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE
>> Column stats: NONE
>>             table:
>>                 input format:
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>>                 output format:
>> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>>                 serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>>                 name: data
>>
>>   Stage: Stage-0
>>     Move Operator
>>       tables:
>>           partition:
>>             range
>>           replace: false
>>           table:
>>               input format:
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>>               output format:
>> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>>               name: data
>>
>> Time taken: 0.913 seconds, Fetched: 45 row(s)
>>
>>
>>
>>  [root@server]# hive -e "set hive.exec.dynamic.partition.mode=nonstrict;
>> set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue;
>> insert into table data partition (range) select * from loading_data_0;"
>> Logging initialized using configuration in
>> /opt/hadoop/latest-hive/conf/hive.log4j
>> OK
>> Time taken: 0.513 seconds
>> Total jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks not specified. Estimated from input data size: 1
>> In order to change the average load for a reducer (in bytes):
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapreduce.job.reduces=<number>
>> Starting Job = job_1398130933303_1467, Tracking URL =
>> http://server:8088/proxy/application_1398130933303_1467/
>> Kill Command = /opt/hadoop/latest-hadoop/bin/hadoop job  -kill
>> job_1398130933303_1467
>> Hadoop job information for Stage-1: number of mappers: 1; number of
>> reducers: 1
>> 2014-04-22 11:33:26,984 Stage-1 map = 0%,  reduce = 0%
>> 2014-04-22 11:33:51,833 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_1398130933303_1467 with errors
>> Error during job, obtaining debugging information...
>> Examining task ID: task_1398130933303_1467_m_000000 (and more) from job
>> job_1398130933303_1467
>>
>> Task with the most failures(4):
>> -----
>> Task ID:
>>   task_1398130933303_1467_m_000000
>>
>> URL:
>>
>> http://server:8088/taskdetails.jsp?jobid=job_1398130933303_1467&tipid=task_1398130933303_1467_m_000000
>> -----
>> Diagnostic Messages for this Task:
>> Error: java.lang.RuntimeException:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
>> processing row {"a":123,"b":436,"range":123}
>>          at
>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
>> Error while processing row {"a":123,"b":436,"range":123}
>>         at
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>>         at
>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>>         ... 8 more
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>>         at
>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
>>         at
>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>         at
>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>>         at
>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>         at
>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>>         at
>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>         at
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>>         ... 9 more
>> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>>         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>         at java.util.ArrayList.get(ArrayList.java:322)
>>         at
>> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:121)
>>         at
>> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109)
>>         at
>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:283)
>>         at
>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268)
>>         at
>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:251)
>>         at
>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:264)
>>         ... 15 more
>>
>> Container killed by the ApplicationMaster.
>> Container killed on request. Exit code is 143
>> Container exited with a non-zero exit code 143
>>
>>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> MapReduce Jobs Launched:
>> Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
>>  Total MapReduce CPU Time Spent: 0 msec
>>
>> Does that help?  I took a quick look at ReduceSinkOperator, but was
>> unable to put my finger on the issue.
>>
>> Regards,
>>
>> Bryan Jeffrey
>>
>>
>>
>> On Mon, Apr 21, 2014 at 10:55 PM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>>
>>> Hi Bryan
>>>
>>> Can you provide more information about the input and output tables?
>>> Schema? Partitioning and bucketing information? Explain plan of your insert
>>> query?
>>>
>>> These information will help to diagnose the issue.
>>>
>>> Thanks
>>> Prasanth
>>>
>>> Sent from my iPhone
>>>
>>> > On Apr 21, 2014, at 7:00 PM, Bryan Jeffrey <bryan.jeffrey@gmail.com>
>>> wrote:
>>> >
>>> > Hello.
>>> >
>>> > I am running Hadoop 2.4.0 and Hive 0.13.0.  I am encountering the
>>> following error when converting a text table to ORC via the following
>>> command:
>>> >
>>> > Error:
>>> >
>>> > Diagnostic Messages for this Task:
>>> > Error: java.lang.RuntimeException:
>>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
>>> processing row { - Removed -}
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>>> >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>> >         at
>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>>> >         at java.security.AccessController.doPrivileged(Native Method)
>>> >         at javax.security.auth.Subject.doAs(Subject.java:396)
>>> >         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
>>> Runtime Error while processing row { - Removed -}
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>>> >         ... 8 more
>>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>>> >         ... 9 more
>>> > Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
>>> >         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>> >         at java.util.ArrayList.get(ArrayList.java:322)
>>> >         at
>>> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:121)
>>> >         at
>>> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109)
>>> >         at
>>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:283)
>>> >         at
>>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:251)
>>> >         at
>>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:264)
>>> >         ... 15 more
>>> >
>>> > Container killed by the ApplicationMaster.
>>> > Container killed on request. Exit code is 143
>>> > Container exited with a non-zero exit code 143
>>> >
>>> > There are a number of older issues associated with IndexOutOfBounds
>>> errors within the serde, but nothing that appears to specifically match
>>> this error.  This occurs with all tables (including those consisting of
>>> exclusively integers).  Any thoughts?
>>> >
>>> > Regards,
>>> >
>>> > Bryan Jeffrey
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Mime
View raw message