hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Connell, Chuck" <>
Subject RE: How to load csv data into HIVE
Date Fri, 07 Sep 2012 14:57:57 GMT
I cannot promise which is faster. A lot depends on how clever your scripts are.

From: Sandeep Reddy P []
Sent: Friday, September 07, 2012 10:42 AM
Subject: Re: How to load csv data into HIVE

I wrote a shell script to get csv data but when i run that script on a 12GB csv its taking
more time. If i run a python script will that be faster?
On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck <<>>
How about a Python script that changes it into plain tab-separated text? So it would look
like this...

174969274<tab>14-mar-2006<tab>3522876<tab> <tab>14-mar-2006<tab>500000308<tab>65<tab>1<newline>

Tab-separated with newlines is easy to read and works perfectly on import.

Chuck Connell
Nuance R&D Data Team
Burlington, MA

From: Sandeep Reddy P [<>]
Subject: How to load csv data into HIVE

Here is the sample data

How to load this kind of data into HIVE?
I'm using shell script to get rid of double quotes and '|' but its taking very long time to
work on each csv which are 12GB each. What is the best way to do this?


View raw message