hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marta Kuczora (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16845) INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE
Date Wed, 07 Jun 2017 17:13:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041237#comment-16041237
] 

Marta Kuczora commented on HIVE-16845:
--------------------------------------

After investigating the issue I found that it is related to [HIVE-15114|https://issues.apache.org/jira/browse/HIVE-15114]

The exception in the *ConditionalResolverMergeFiles.generateActualTasks* method happens if
there are partitions which should be merged and also there are partitions which shouldn't.
The exception happens because for the partition which shouldn't be merged the move work contains
null in the LoadFileWork variable. In this move work the LoadTableWork variable is set instead.
{noformat}
MoveWork mvWork = (MoveWork) mvTask.getWork();
LoadFileDesc lfd = mvWork.getLoadFileWork();
Path targetDir = lfd.getTargetDir();
{noformat}

After doing some more digging, here is a very short summary about my findings:
in the GenMapRedUtils.createMRWorkForMergingFiles method the "dummyMv" move work is created.
In this move work the LoadFileDesc field is set. Then a conditional task is created and if
the blobstore optimizations are enabled this conditional task won't contain the dummyMv task.
It will be created from a move task which contain only the LoadTableWork. This move task will
be passed to the ConditionalResolverMergeFiles and because its LoadFileWork variable is null,
will cause NPE.

> INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE
> ---------------------------------------------------------------------
>
>                 Key: HIVE-16845
>                 URL: https://issues.apache.org/jira/browse/HIVE-16845
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.1
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>
> *How to reproduce*
> - Create a partitioned table on S3:
> {noformat}
> CREATE EXTERNAL TABLE s3table(user_id string COMMENT '', event_name string COMMENT '')
PARTITIONED BY (reported_date string, product_id int) LOCATION 's3a://<bucket name>';

> {noformat}
> - Create a temp table:
> {noformat}
> create table tmp_table (id string, name string, date string, pid int) row format delimited
fields terminated by '\t' lines terminated by '\n' stored as textfile;
> {noformat}
> - Load the following rows to the tmp table:
> {noformat}
> u1	value1	2017-04-10	10000
> u2	value2	2017-04-10	10000
> u3	value3	2017-04-10	10001
> {noformat}
> - Set the following parameters:
> -- hive.exec.dynamic.partition.mode=nonstrict
> -- mapreduce.input.fileinputformat.split.maxsize=10
> -- hive.blobstore.optimizations.enabled=true
> -- hive.blobstore.use.blobstore.as.scratchdir=false
> -- hive.merge.mapfiles=true
> - Insert the rows from the temp table into the s3 table:
> {noformat}
> INSERT OVERWRITE TABLE s3table
> PARTITION (reported_date, product_id)
> SELECT
>   t.id as user_id,
>   t.name as event_name,
>   t.date as reported_date,
>   t.pid as product_id
> FROM tmp_table t;
> {noformat}
> A NPE will occur with the following stacktrace:
> {noformat}
> 2017-05-08 21:32:50,607 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool:
Thread-184028]: Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED:
Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.ConditionalTask. null
> at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
> at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239)
> at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.generateActualTasks(ConditionalResolverMergeFiles.java:290)
> at org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.getTasks(ConditionalResolverMergeFiles.java:175)
> at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1977)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1690)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1422)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1201)
> at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
> ... 11 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message