hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Chakka <pcha...@facebook.com>
Subject Re: Adding/appending data to existing table/partition
Date Wed, 17 Mar 2010 21:06:41 GMT
they will work as long as you put the files in the expected location for regular tables.


On Mar 17, 2010, at 1:57 PM, Ryan LeCompte wrote:

This is interesting... thanks for the response.

My tables are not defined as "external" tables, however. I wonder if this would still work?

Thanks,
Ryan


On Wed, Mar 17, 2010 at 4:46 PM, Yen Pai <yen.pai@gmail.com<mailto:yen.pai@gmail.com>>
wrote:
Hi Ryan,

I was just experimenting with this recently and this is my experience with "external" tables.
 I would imagine regular tables work similarly.

In Hive a partition is actually a folder in HDFS, so if you put another file in the partition
folder, formatted according to the original table definition, you are in effect "appending"
to the partition.

For example, if your table exists as:
/user/hive/warehouse/mytable/

And you have a partition folder:
/user/hive/warehouse/mytable/2010-03-16/

With data files inside it:
/user/hive/warehouse/mytable/2010-03-16/data1
/user/hive/warehouse/mytable/2010-03-16/data2

You can just put more files in the partition folder in HDFS (data3, data4, etc.) and they
will be recognized as part of the partition.

- Yen




On Wed, Mar 17, 2010 at 1:05 PM, Ryan LeCompte <lecompte@gmail.com<mailto:lecompte@gmail.com>>
wrote:
Actually, I wasn't clear earlier... we are currently using this syntax for loading data into
the table/partition:

INSERT OVERWRITE TABLE ourtable PARTITION(dt='2010-03-16') ...

If I execute this multiple times, I believe the data will simply be overwritten instead of
appended, right?






On Wed, Mar 17, 2010 at 4:01 PM, Ryan LeCompte <lecompte@gmail.com<mailto:lecompte@gmail.com>>
wrote:
Awesome! I didn't know this. :) I'll get it a shot, thanks!



On Wed, Mar 17, 2010 at 3:57 PM, Edward Capriolo <edlinuxguru@gmail.com<mailto:edlinuxguru@gmail.com>>
wrote:


On Wed, Mar 17, 2010 at 3:30 PM, Ryan LeCompte <lecompte@gmail.com<mailto:lecompte@gmail.com>>
wrote:
Hello all,

Is it possible in Hive 0.5 to run multiple inserts into the same Hive table/partition? Or
is this not supported due to the fact that Hadoop doesn't support appends properly?

For example, it would be nice to periodically add new data every 5 minutes to a table that
has a partition column for "date" via multiple periodic INSERT statements.

Thanks!

Ryan

Ryan,

Every file inside the partition makes up the partiion. So with 'LOAD DATA INFILE (X)', if
X is a unique name it will be "appended".

This works for us since our 5 minute log files all have unique names .

Edward






Mime
View raw message