hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
Date Tue, 05 Jul 2016 05:13:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362017#comment-15362017
] 

Matt McCline commented on HIVE-14004:
-------------------------------------

The patch includes a bunch of changes originally written for HIVE-13974 that rework the handling
of schemas and move to using TypeDescription class instead of arrays/lists of the OrcProto.Type
class.  It solves the bug in this JIRA.

Code in OrcRawRecordMerger that manipulated the include and column name arrays has been moved
to Schema Evolution in an attempt to centralize that logic and it also contributed to the
solution, too.

Now, HIVE-13974 will be based on this code + more logic that needs to be added to the ORC
generation logic for handling logical/reader/file schema for inner STRUCT types and Schema
Evolution.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14004
>                 URL: https://issues.apache.org/jira/browse/HIVE-14004
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Matt McCline
>         Attachments: HIVE-14004.01.patch, HIVE-14004.01.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
>     int[][] tableData = {{1,2},{3,4}};
>     runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + makeValuesClause(tableData));
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
>     Worker t = new Worker();
>     t.setThreadId((int) t.getId());
>     t.setHiveConf(hiveConf);
>     AtomicBoolean stop = new AtomicBoolean();
>     AtomicBoolean looped = new AtomicBoolean();
>     stop.set(true);
>     t.init(stop, looped);
>     t.run();
>     runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
>     runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 2");
>     runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
>     t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner (LocalJobRunner.java:run(560))
- job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>         at org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208)
~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
~[classes/:?]
>         at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
~[classes/:?]
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
>         at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
~[hadoop-mapreduce-client-common-2.6.1.jar:?]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_71]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[?:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[?:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
> {noformat}
> I observed the same on a real cluster.
> Based on my observations, running Major compaction instead of minor, works fine.
> Replacing the DELETE operation with update, makes both Major/Minor run fine.
> The issue itself should be addressed by HIVE-13974 but need to make sure to add the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message