Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 27893D46A for ; Fri, 7 Sep 2012 14:42:16 +0000 (UTC) Received: (qmail 37320 invoked by uid 500); 7 Sep 2012 14:42:14 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 37256 invoked by uid 500); 7 Sep 2012 14:42:14 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 37248 invoked by uid 99); 7 Sep 2012 14:42:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Sep 2012 14:42:14 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sandeepreddy.3647@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-wg0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Sep 2012 14:42:08 +0000 Received: by wgbfm10 with SMTP id fm10so379616wgb.5 for ; Fri, 07 Sep 2012 07:41:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=eZY/Gcx8tdkKPd2aGoQ59NoKSvpTGw7PqEMuZG3qBPs=; b=lrGTor3KtmkMwTqEBC0o1NmY8N2fTHHpiSzGOFQ1hk+l6bhM7xWazcKSBp9st01nFe 7TPHw6bhvj2ZBedzS6oMtXOd2ty+L7pSygzKQ7HVsn9vk4D3XJjGoGop1nYFDkN/4wEC I60E3bkQqaysydRp5YQPot9AjFb1YfzA9NpesZ1N0lGpDPSkMh611w8UzdZJMCJ5GlAF gkQIh8BpdRtTsMHTde4taJFgvIsK4vXL2YI9WgkFLVh5d7qXwmipFnH/PmBPO855GoKy 9qhPmpS7ct8PyTS03uPDRD4kZSseN9I3khvkRqjvXg0Pk//fvsR2yXaUyO+kXy9WQCe+ WvAw== MIME-Version: 1.0 Received: by 10.180.20.11 with SMTP id j11mr12766827wie.12.1347028907923; Fri, 07 Sep 2012 07:41:47 -0700 (PDT) Received: by 10.194.18.242 with HTTP; Fri, 7 Sep 2012 07:41:47 -0700 (PDT) In-Reply-To: <9D8A350A3269554E91B45801B5E8CDAC67CC25@SOM-EXCH02.nuance.com> References: <9D8A350A3269554E91B45801B5E8CDAC67CC25@SOM-EXCH02.nuance.com> Date: Fri, 7 Sep 2012 10:41:47 -0400 Message-ID: Subject: Re: How to load csv data into HIVE From: Sandeep Reddy P To: user@hive.apache.org Content-Type: multipart/alternative; boundary=bcaec53d5b098f1b0504c91d9bd4 --bcaec53d5b098f1b0504c91d9bd4 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, I wrote a shell script to get csv data but when i run that script on a 12GB csv its taking more time. If i run a python script will that be faster? On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck w= rote: > How about a Python script that changes it into plain tab-separated text? > So it would look like this=85**** > > ** ** > > 17496927414-mar-20063522876 > 14-mar-2006500000308651 > etc=85**** > > ** ** > > Tab-separated with newlines is easy to read and works perfectly on import= . > **** > > ** ** > > Chuck Connell**** > > Nuance R&D Data Team**** > > Burlington, MA**** > > 781-565-4611**** > > ** ** > > *From:* Sandeep Reddy P [mailto:sandeepreddy.3647@gmail.com] > *Subject:* How to load csv data into HIVE**** > > ** ** > > Hi, > Here is the sample data > "174969274","14-mar-2006","**** > > 3522876","","14-mar-2006","500000308","65","1"| > "174969275","19-jul-2006","3523154","","19-jul-2006","500000308","65","1"= | > "174969276","31-dec-2005","3530333","","31-dec-2005","500000308","65","1"= | > "174969277","14-apr-2005","3531470","","14-apr-2005","500000308","65","1"= | > > How to load this kind of data into HIVE? > I'm using shell script to get rid of double quotes and '|' but its taking > very long time to work on each csv which are 12GB each. What is the best > way to do this?**** > > ** ** > --=20 Thanks, sandeep --bcaec53d5b098f1b0504c91d9bd4 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi,
I wrote a shell script to get csv data but when i run that script on= a 12GB csv its taking more time. If i run a python script will that be fas= ter?

On Fri, Sep 7, 2012 at 10:39 AM, Con= nell, Chuck <Chuck.Connell@nuance.com> wrote:

How about a Python script= that changes it into plain tab-separated text? So it would look like this= =85

=A0<= /p>

174969274<tab>14-mar-2006<tab>3522876<= ;tab> <tab>14-mar-2006<tab>500000308<tab>65<tab>= 1<newline>
etc=85

=A0

Tab-separated with newlines= is easy to read and works perfectly on import.

=A0

Chuck Connell

Nuance R&D Data Team

Burlington, MA

781-565-4611

=A0<= /p>

From: Sandeep = Reddy P [mailto:sandeepreddy.3647@gmail.com]
Subject: How to load csv data into HIVE

=A0

Hi,
Here is the sample data
"174969274","14-mar-2006","

3522876","&= quot;,"14-mar-2006","500000308","65","1&= quot;|
"174969275","19-jul-2006","3523154","&qu= ot;,"19-jul-2006","500000308","65","1&qu= ot;|
"174969276","31-dec-2005","3530333","&qu= ot;,"31-dec-2005","500000308","65","1&qu= ot;|
"174969277","14-apr-2005","3531470","&qu= ot;,"14-apr-2005","500000308","65","1&qu= ot;|

How to load this kind of data into HIVE?
I'm using shell script to get rid of double quotes and '|' but = its taking very long time to work on each csv which are 12GB each. What is = the best way to do this?

=A0




--
Thanks,
sandeep
<= br> --bcaec53d5b098f1b0504c91d9bd4--