hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones
Date Mon, 14 Nov 2016 20:59:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665000#comment-15665000
] 

Sergio Peña commented on HIVE-15199:
------------------------------------

This issue is happening on the following code (Hive.java):
{noformat}
private static void copyFiles(...) {
...
  if (renameNonLocal) {
      for (int counter = 1; !destFs.rename(srcP,destPath); counter++) {
           destPath = new Path(destf, name + ("_copy_" + counter) + filetype);
       }
  } else {
       destPath = mvFile(conf, srcP, destPath, isSrcLocal, srcFs, destFs, name, filetype);
  }
...
}
{noformat}

Even if the file already exists on S3, the {{destFs.rename()}} call is renaming the file.

This does not happen with HDFS. If the file exists on HDFS, then the rename will fail, and
the _copy_ string will be appended to the filename, and retry the rename.

[~stevel@apache.org] Do you know if this is a known bug on the Hadoop side?

> INSERT INTO data on S3 is replacing the old rows with the new ones
> ------------------------------------------------------------------
>
>                 Key: HIVE-15199
>                 URL: https://issues.apache.org/jira/browse/HIVE-15199
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Critical
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is saved on
S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1       name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2       name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message