hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin BONNET (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14660) ArrayIndexOutOfBoundsException on delete
Date Sat, 27 Aug 2016 20:12:20 GMT
Benjamin BONNET created HIVE-14660:
--------------------------------------

             Summary: ArrayIndexOutOfBoundsException on delete
                 Key: HIVE-14660
                 URL: https://issues.apache.org/jira/browse/HIVE-14660
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 1.2.1
            Reporter: Benjamin BONNET


Hi,

DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
That bug occurs at Reduce phase when there are less reducers than the number of the table
buckets.

In order to reproduce, create a simple ACID table :

{code:sql}
CREATE TABLE test (`cle` bigint,`valeur` string)
 PARTITIONED BY (`annee` string)
 CLUSTERED BY (cle) INTO 5 BUCKETS
 TBLPROPERTIES ('transactional'='true');
{code}

Populate it with lines distributed among all buckets, with random values and a few partitions.
Force the Reducers to be less than the buckets :
{code:sql}
set mapred.reduce.tasks=1;
{code}
Then execute a delete that will remove many lines from all the buckets.
{code:sql}
DELETE FROM test WHERE valeur<'some_value';
{code}
Then you will get an ArrayIndexOutOfBoundsException :
{code}
2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
        ... 17 more
{code}
Adding logs into FileSinkOperator, one sees the operator deals with buckets 0, 1, 2, 3, 4,
then 0 again and it fails at line 769 : actually each time you switch bucket, you move forwards
in a 5 (number of buckets) elements array. So when you get bucket 0 for the second time, you
get out of the array...








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message