Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45DC7963A for ; Wed, 7 Mar 2012 16:12:58 +0000 (UTC) Received: (qmail 61474 invoked by uid 500); 7 Mar 2012 16:12:57 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 61106 invoked by uid 500); 7 Mar 2012 16:12:55 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 61081 invoked by uid 99); 7 Mar 2012 16:12:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 16:12:55 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=041310bee8=ranjith.raghunath1@usaa.com designates 167.24.25.92 as permitted sender) Received: from [167.24.25.92] (HELO prodomx02.usaa.com) (167.24.25.92) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 16:12:45 +0000 Received: from pps.filterd (prodomx02 [127.0.0.1]) by prodomx02.usaa.com (8.14.4/8.14.4) with SMTP id q27G8JbB025871 for ; Wed, 7 Mar 2012 10:12:23 -0600 Received: from prodexch06w.eagle.usaa.com (prodexch06w.usaa.com [10.170.40.22]) by prodomx02.usaa.com with ESMTP id 13cmhwr30m-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Wed, 07 Mar 2012 10:12:23 -0600 Received: from PRODEXCH08W.eagle.usaa.com (10.170.40.31) by PRODEXCH06W.eagle.usaa.com (10.170.40.22) with Microsoft SMTP Server (TLS) id 14.1.289.1; Wed, 7 Mar 2012 10:12:23 -0600 Received: from PRODEXMB04W.eagle.usaa.com ([169.254.5.112]) by PRODEXCH08W.eagle.usaa.com ([10.170.40.31]) with mapi id 14.01.0289.001; Wed, 7 Mar 2012 10:12:22 -0600 From: "Raghunath, Ranjith" To: "user@hive.apache.org" Subject: RE: Need a smart way to delete the first row of my data Thread-Topic: Need a smart way to delete the first row of my data Thread-Index: AQHM/HuPlGK43vVUt0Kei5OmeAZM95Ze/rwwgAACMqA= Date: Wed, 7 Mar 2012 16:12:21 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.123.46.118] Content-Type: multipart/alternative; boundary="_000_D67BECDE6F3D764AB087905A7E421B4F47C3C875PRODEXMB04Weagl_" MIME-Version: 1.0 X-Proofpoint-Direction: FromExch X-Proofpoint-Direction: Internet X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.6.7498,1.0.260,0.0.0000 definitions=2012-03-07_06:2012-03-07,2012-03-07,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_D67BECDE6F3D764AB087905A7E421B4F47C3C875PRODEXMB04Weagl_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sorry didn't read that you were trying to drop the header row specifically.= Having said that the solution outlined below is probably not a way to go o= n this. I like Matt's suggestion and seems like a better approach. From: Raghunath, Ranjith [mailto:Ranjith.Raghunath1@usaa.com] Sent: Wednesday, March 07, 2012 10:06 AM To: user@hive.apache.org Subject: RE: Need a smart way to delete the first row of my data Give you a key column that is unique within your dataset I think this could= work. 1. Load the file as is, gunzipped, into a hive table 2. Determine the total row size. 3. Perform a insert into table .... Select * from .... Order by desc limit From: Dan Y [mailto:dan.m.yelle@gmail.com] Sent: Wednesday, March 07, 2012 10:01 AM To: user@hive.apache.org Subject: Need a smart way to delete the first row of my data Hello, I have huge gzipped files that I need to drop the header row from before lo= ading to a hive table. Right now, my process is: 1. Gunzip the data (...takes forever) 2. Drop the first row using the Unix sed command 3. Re-zip the data with gzip -1 (...takes forever) 4. Create the Hive table (on the compressed file to store it efficiently) I am trying to find a way to speed up this process. Ideally, it would invo= lve loading the data to Hive as a first step and then deleting the first ro= w, to avoid the unzip/rezip steps. Any ideas would be appreciated! -Dan --_000_D67BECDE6F3D764AB087905A7E421B4F47C3C875PRODEXMB04Weagl_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Sorry didn’t read t= hat you were trying to drop the header row specifically. Having said that t= he solution outlined below is probably not a way to go on this.  I like Matt’s suggestion and seems like a better approach.

 <= /p>

From: Raghunat= h, Ranjith [mailto:Ranjith.Raghunath1@usaa.com]
Sent: Wednesday, March 07, 2012 10:06 AM
To: user@hive.apache.org
Subject: RE: Need a smart way to delete the first row of my data

 

Give you a key column tha= t is unique within your dataset I think this could work.<= /p>

 <= /p>

1. = ;      Load the file as = is, gunzipped, into a hive table

2. = ;      Determine the tot= al row size.

3. = ;      Perform a insert = into table …. Select * from …. Order by <col_name> desc l= imit <total_size -1>

 <= /span>

 <= /p>

From: Dan Y [m= ailto:dan.m.yelle@gmail.com]
Sent: Wednesday, March 07, 2012 10:01 AM
To: user@hive.apache.org
Subject: Need a smart way to delete the first row of my data

 

Hello,

 

I have huge gzipped files that I need to drop the he= ader row from before loading to a hive table.

 

Right now, my process is:

1. Gunzip the data (...takes forever)

2. Drop the first row using the Unix sed command

3. Re-zip the data with gzip -1 (...takes forever)

4. Create the Hive table (on the compressed file to = store it efficiently)

 

I am trying to find a way to speed up this process. =  Ideally, it would involve loading the data to Hive as a first step an= d then deleting the first row, to avoid the unzip/rezip steps.  <= /o:p>

 

Any ideas would be appreciated!

 

-Dan

 

--_000_D67BECDE6F3D764AB087905A7E421B4F47C3C875PRODEXMB04Weagl_--