hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kirk True (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1996) "LOAD DATA INPATH" fails when the table already contains a file of the same name
Date Wed, 16 Feb 2011 20:23:24 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kirk True updated HIVE-1996:
----------------------------

    Description: 
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

{{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

{{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

{{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to
use the same array elements (with the un-renamed, old file names). It crashes with this error:

{noformat}
java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{noformat}

  was:
Steps:

1. From the command line copy the kv2.txt data file into the current user's HDFS directory:

    {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}

2. In Hive, create the table:

    {{create table tst_src1 (key_ int, value_ string);}}

3. Load the data into the table from HDFS:

    {{load data inpath './kv2.txt' into table tst_src1;}}

4. Repeat step 1
5. Repeat step 3

Expected:

To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.

Actual:

File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues to
use the same array elements (with the un-renamed, old file names). It crashes with this error:

{{java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
    at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
    at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
    at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
}}


> "LOAD DATA INPATH" fails when the table already contains a file of the same name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Kirk True
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307.
> Actual:
> File is renamed, but Hive.copyFiles doesn't "see" the change in "srcs" as it continues
to use the same array elements (with the un-renamed, old file names). It crashes with this
error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message