Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1675184FE for ; Tue, 25 Aug 2015 14:22:24 +0000 (UTC) Received: (qmail 65263 invoked by uid 500); 25 Aug 2015 14:22:22 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 65182 invoked by uid 500); 25 Aug 2015 14:22:22 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 65171 invoked by uid 99); 25 Aug 2015 14:22:22 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 14:22:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4D18DEDC15 for ; Tue, 25 Aug 2015 14:22:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id w7CVpUY-_Wgp for ; Tue, 25 Aug 2015 14:22:15 +0000 (UTC) Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 9E407255CA for ; Tue, 25 Aug 2015 14:22:14 +0000 (UTC) Received: by ykdt205 with SMTP id t205so158276424ykd.1 for ; Tue, 25 Aug 2015 07:22:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0m00qSUUXKaH+rIGmd4z6TStVvKAe/YsAcMou+qk+cw=; b=fpMtwbWiPcWXKZ/DIpmA31v+iOqBJDxfjEqP01q/QRTH/KWruXgVy5TFw0bwOeh10V YdzSCseH9dvKAyiau348olry4DsIcFD6Y3D5CB/KmWn39yMmXFj3jEVwBj0hhi5hmN44 XCwaH9lksgr3i1yXrCqK04V3CsQgUI8jgxZob/C/scmuHuUJJFzN22k/8RnEgyhFMcM/ ux2qtQszpGLruHChh+Qu4wRAMDP+tOH7iymUfBfabc/UY1XGI8P8jsi/S5id8teyA1O6 jU5EK9dquRR41QVJMkYV6/ofmFgEbmHQLXvE19fVyWW8+R7W1JxzyETeXJWsp2dZUUYk GJTg== MIME-Version: 1.0 X-Received: by 10.170.52.205 with SMTP id 196mr37479312yku.81.1440512533776; Tue, 25 Aug 2015 07:22:13 -0700 (PDT) Received: by 10.129.82.9 with HTTP; Tue, 25 Aug 2015 07:22:13 -0700 (PDT) In-Reply-To: References: Date: Tue, 25 Aug 2015 07:22:13 -0700 Message-ID: Subject: Re: Data Deleted on Hive External Table From: Peyman Mohajerian To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a113938eadecf56051e237555 --001a113938eadecf56051e237555 Content-Type: text/plain; charset=UTF-8 Data was generated in some other cluster, they moved it to s3 and then copied it to my cluster into the warehouse path. I then created a schema over it. You are correct that this would not be the right process and we had no plans to do this in production, it was a POC. Nevertheless in my view 'external' should still carry the same meaning that 'Despite the fact that data is in warehouse, I'm just doing some experimentation on the different schema design and am creating temporary schema over this data and therefore don't delete the content'. Perhaps instead of using 'external' there is other options. Also if 'external' doesn't mean anything in this scenario perhaps throw me an exception so I'm unable to create the table in the first place. Again what I'm saying above is my logic and I could be wrong in something. On Tue, Aug 25, 2015 at 7:09 AM, Jeetendra G wrote: > if you put external in the table definition and point INPATH to hive the > original data(where data is landing from other source ). then how come > data will come to /user/hive/warehouse. /user/hive/warehouse should only be > populated with data when its 'internal'? > > On Tue, Aug 25, 2015 at 7:33 PM, Peyman Mohajerian > wrote: > >> Hi Jeetendra, >> >> What I was originally saying is that if you drop the table, it will >> deleted the data despite the fact that you put 'external' in the >> definition. I think this behavior is due to the fact that data is in >> /user/hive/warehouse and therefore Hive assumes ownership and ignores the >> 'external' directive! I would have assumed 'external' would still carry its >> meaning and dropping the table would not delete the data, but I was wrong. >> If I got this inaccurately please challenge my conclusion. >> >> Thanks, >> Peyman >> >> On Mon, Aug 24, 2015 at 11:22 PM, Jeetendra G >> wrote: >> >>> Hi Peyman >>> >>> I created a new Hive external table with partition column name of 'yr' >>> instead of 'year' pointing to the same base directory. >>> if this is a case how come /user/hive/warehouse having the data? it >>> should not right? >>> >>> On Tue, Aug 25, 2015 at 4:41 AM, Peyman Mohajerian >>> wrote: >>> >>>> Hi Guys, >>>> >>>> I managed to delete some data in HDFS by dropping a partitioned >>>> external Hive table. One explanation is that data resided in the >>>> 'warehouse' directory of Hive and that had something to do with? >>>> An alternative explanation may that my 'drop table' statement didn't >>>> delete the data but my follow up 'create table' statement with a different >>>> partition name did. Let me elaborate, files used to be in this directory >>>> structure: >>>> /user/hive/warehouse//year=2009 >>>> >>>> I created a new Hive external table with partition column name of 'yr' >>>> instead of 'year' pointing to the same base directory. Is it possible that >>>> this create statement deleted the data (highly doubt that)? Either case >>>> were unexpected to me! >>>> >>>> This is on Hive 1.0. >>>> >>>> Thanks, >>>> Peyman >>>> >>> >>> >> > --001a113938eadecf56051e237555 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Data was generated in some other cluster, they moved it to= s3 and then copied it to my cluster into the warehouse path. I then create= d a schema over it. You are correct that this would not be the right proces= s and we had no plans to do this in production, it was a POC. Nevertheless = in my view 'external' should still carry the same meaning that '= ;Despite the fact that data is in warehouse, I'm just doing some experi= mentation on the different schema design and am creating temporary schema o= ver this data and therefore don't delete the content'. Perhaps inst= ead of using 'external' there is other options.=C2=A0 Also if '= external' doesn't mean anything in this scenario perhaps throw me a= n exception so I'm unable to create the table in the first place.
A= gain what I'm saying above is my logic and I could be wrong in somethin= g.



=
On Tue, Aug 25, 2015 at 7:09 AM, Jeetendra G <jeetendra.g@housing.com> wrote:
if you put external in the table definition and = point =C2=A0INPATH to hive the original data(where data is landing from oth= er source =C2=A0). then how come data will come to /user/hive/warehouse. /u= ser/hive/warehouse should only be populated with data when its 'interna= l'?

On Tue, Aug 25, 2015 at 7:33 PM, Peyman= Mohajerian <mohajeri@gmail.com> wrote:
Hi Jeetendra,

What I was= originally saying is that if you drop the table, it will deleted the data = despite the fact that you put 'external' in the definition. I think= this behavior is due to the fact that data is in /user/hive/warehouse and = therefore Hive assumes ownership and ignores the 'external' directi= ve! I would have assumed 'external' would still carry its meaning a= nd dropping the table would not delete the data, but I was wrong.
If I got this inaccurately please challenge my conclusion.

<= /div>
Thanks,
Peyman

On Mon, Aug 24, 2015 at 11:22 PM, Je= etendra G <jeetendra.g@housing.com> wrote:
Hi Peyman

= I created a new Hive external = table with partition column name of 'yr' instead of 'year' = pointing to the same base directory.
if this is a case how come /user/hive/wa= rehouse having the data? it should not right?
<= div class=3D"gmail_extra">
On Tue, Aug 25, 20= 15 at 4:41 AM, Peyman Mohajerian <mohajeri@gmail.com> wrote= :
Hi Guys,

I managed to delete some data in HDFS by dropping a partitioned exter= nal Hive table. One explanation is that data resided in the 'warehouse&= #39; directory of Hive and that had something to do with?
An alte= rnative explanation may that my 'drop table' statement didn't d= elete the data but my follow up 'create table' statement with a dif= ferent partition name did. Let me elaborate, files used to be in this direc= tory structure:
/user/hive/warehouse/<tablename>/year=3D200= 9=C2=A0

I created a new Hive external table with p= artition column name of 'yr' instead of 'year' pointing to = the same base directory. Is it possible that this create statement deleted = the data (highly doubt that)? Either case were unexpected to me!
=
This is on Hive 1.0.

Thanks,
<= div>Peyman




--001a113938eadecf56051e237555--