hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Mohajerian <>
Subject Re: Data Deleted on Hive External Table
Date Tue, 25 Aug 2015 14:22:13 GMT
Data was generated in some other cluster, they moved it to s3 and then
copied it to my cluster into the warehouse path. I then created a schema
over it. You are correct that this would not be the right process and we
had no plans to do this in production, it was a POC. Nevertheless in my
view 'external' should still carry the same meaning that 'Despite the fact
that data is in warehouse, I'm just doing some experimentation on the
different schema design and am creating temporary schema over this data and
therefore don't delete the content'. Perhaps instead of using 'external'
there is other options.  Also if 'external' doesn't mean anything in this
scenario perhaps throw me an exception so I'm unable to create the table in
the first place.
Again what I'm saying above is my logic and I could be wrong in something.

On Tue, Aug 25, 2015 at 7:09 AM, Jeetendra G <>

> if you put external in the table definition and point  INPATH to hive the
> original data(where data is landing from other source  ). then how come
> data will come to /user/hive/warehouse. /user/hive/warehouse should only be
> populated with data when its 'internal'?
> On Tue, Aug 25, 2015 at 7:33 PM, Peyman Mohajerian <>
> wrote:
>> Hi Jeetendra,
>> What I was originally saying is that if you drop the table, it will
>> deleted the data despite the fact that you put 'external' in the
>> definition. I think this behavior is due to the fact that data is in
>> /user/hive/warehouse and therefore Hive assumes ownership and ignores the
>> 'external' directive! I would have assumed 'external' would still carry its
>> meaning and dropping the table would not delete the data, but I was wrong.
>> If I got this inaccurately please challenge my conclusion.
>> Thanks,
>> Peyman
>> On Mon, Aug 24, 2015 at 11:22 PM, Jeetendra G <>
>> wrote:
>>> Hi Peyman
>>> I created a new Hive external table with partition column name of 'yr'
>>> instead of 'year' pointing to the same base directory.
>>> if this is a case how come /user/hive/warehouse having the data? it
>>> should not right?
>>> On Tue, Aug 25, 2015 at 4:41 AM, Peyman Mohajerian <>
>>> wrote:
>>>> Hi Guys,
>>>> I managed to delete some data in HDFS by dropping a partitioned
>>>> external Hive table. One explanation is that data resided in the
>>>> 'warehouse' directory of Hive and that had something to do with?
>>>> An alternative explanation may that my 'drop table' statement didn't
>>>> delete the data but my follow up 'create table' statement with a different
>>>> partition name did. Let me elaborate, files used to be in this directory
>>>> structure:
>>>> /user/hive/warehouse/<tablename>/year=2009
>>>> I created a new Hive external table with partition column name of 'yr'
>>>> instead of 'year' pointing to the same base directory. Is it possible that
>>>> this create statement deleted the data (highly doubt that)? Either case
>>>> were unexpected to me!
>>>> This is on Hive 1.0.
>>>> Thanks,
>>>> Peyman

View raw message