hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Kuzmenko <f1she...@gmail.com>
Subject Delete hive partition while executing query.
Date Mon, 06 Jun 2016 13:01:30 GMT
Hello, I'm trying to find a safe way to delete partition with all data it
includes.

I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test on
transactional table:

asyncExecute("Select count(distinct in_info_msisdn) from
mobile_connections where dt=20151124 and msisdn_last_digit=2", 1);
Thread.sleep(3000);
asyncExecute("alter table mobile_connections drop if exists partition
(dt=20151124, msisdn_last_digit=2) purge", 2);
Thread.sleep(3000);
asyncExecute("Select count(distinct in_info_msisdn) from
mobile_connections where dt=20151124 and msisdn_last_digit=2", 3);
Thread.sleep(3000);
asyncExecute("Select count(distinct in_info_msisdn) from
mobile_connections where dt=20151124 and msisdn_last_digit=2", 4);

(full code here <http://pastebin.com/LsktC0sx>)

I cretate several threads, each execute query async. First is querying
partition. Second drop partition. Others are the same as first. First query
takes about 10-15 seconds to complete, so "alter table" query starts before
first query completes.
As a result i get:

   - First query - successfully completes
   - Second query - successfully completes
   - Third query - successfully completes
   - Fourth query - throw exception:

java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1461923723503_0189_1_00,
diagnostics=[Vertex vertex_1461923723503_0189_1_00 [Map 1] killed/failed
due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections
initializer failed, vertex=vertex_1461923723503_0189_1_00 [Map 1],
java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException:
java.io.FileNotFoundException: File
hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
does not exist.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
... 15 more
Caused by: java.io.FileNotFoundException: File
hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
does not exist.
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:958)
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:937)
at
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:882)
at
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:878)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:878)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
at
org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:690)
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:366)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:648)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
... 4 more
]Vertex killed, vertexName=Reducer 3,
vertexId=vertex_1461923723503_0189_1_02, diagnostics=[Vertex received Kill
in INITED state., Vertex vertex_1461923723503_0189_1_02 [Reducer 3]
killed/failed due to:OTHER_VERTEX_FAILURE]Vertex killed, vertexName=Reducer
2, vertexId=vertex_1461923723503_0189_1_01, diagnostics=[Vertex received
Kill in INITED state., Vertex vertex_1461923723503_0189_1_01 [Reducer 2]
killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:2
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at Test$MyRunnable.run(Test.java:54)
at java.lang.Thread.run(Thread.java:745)

Since I'm using transactional table, I expect, that all queries, executed
after partition drop, will complete successfully with no result.  Am I
doing something wrong? Is there other way to drop partition with data?

Mime
View raw message