hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
Date Tue, 04 Aug 2009 21:18:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739170#action_12739170
] 

Todd Lipcon commented on HIVE-718:
----------------------------------

bq. I think it's not acceptable for a failed "insert" to corrupt the original data of the
table. 

then we definitely have to move an entire directory of files in at once - otherwise we can
have an insert partially succeed

bq. We never have a table with sub directories (instead of files) inside. We will need some
testing to make sure it actually works.

This is going to be a necessity to do non-overwrite loads into a table/partition, right?

bq. For unique name, maybe we can just prepend the job id.

This isn't always available (eg running LOAD DATA from the cli). I think we're stuck with
java.util.UUID, as ugly as it may be.

I've spent the last hour or so trying to figure out any other way of generating a unique name
inside a subdirectory. Because of the semantics of FileSystem.mkdirs and FileSystem.rename,
I don't believe there's any way of doing this. mkdirs doesn't return false in the case that
the directory already exists, and if you rename(src, dst), and dst already exists as a directory,
it will move src *inside* of dst.

> Load data inpath into a new partition without overwrite does not move the file
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-718
>                 URL: https://issues.apache.org/jira/browse/HIVE-718
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>         Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for partitioned tables.
The select after the first load returns nothing, while the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) stored
as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds =
'2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds =
'2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a       2009-08-01
> b       2009-08-01
> d       2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message