hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
Date Tue, 12 Jul 2016 21:10:21 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373699#comment-15373699
] 

Owen O'Malley commented on HIVE-14004:
--------------------------------------

I should give more details. The problem was that OrcInputFormat was modifying the passed in
Options object and that ACID was reusing the Options object between the deltas. Thus, when
some of the delta files had fewer columns, the include array wasn't long enough.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14004
>                 URL: https://issues.apache.org/jira/browse/HIVE-14004
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Owen O'Malley
>         Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, HIVE-14004.03.patch, HIVE-14004.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
>     int[][] tableData = {{1,2},{3,4}};
>     runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + makeValuesClause(tableData));
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
>     Worker t = new Worker();
>     t.setThreadId((int) t.getId());
>     t.setHiveConf(hiveConf);
>     AtomicBoolean stop = new AtomicBoolean();
>     AtomicBoolean looped = new AtomicBoolean();
>     stop.set(true);
>     t.init(stop, looped);
>     t.run();
>     runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
>     runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 2");
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
>     t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner (LocalJobRunner.java:run(560))
- job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>         at org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
~[classes/:?]
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_71]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[?:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[?:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
> {noformat}
> I observed the same on a real cluster.
> Based on my observations, running Major compaction instead of minor, works fine.
> Replacing the DELETE operation with update, makes both Major/Minor run fine.
> The issue itself should be addressed by HIVE-13974 but need to make sure to add the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message