hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: Hive error when loading csv data.
Date Wed, 27 Jun 2012 02:11:54 GMT
What I am suggesting is to write a simple script , maybe using python, where you replace the
commas that are used as field delimiter

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P <sandeepreddy.3647@gmail.com> wrote:

> If i do that my data will be d|"abc|def"|abcd my problem is not solved
> 
> On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel <michael_segel@hotmail.com>wrote:
> 
>> Yup. I just didnt add the quotes.
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P <sandeepreddy.3647@gmail.com>
>> wrote:
>> 
>>> Thanks for the reply.
>>> I didnt get that Michael. My f2 should be "abc,def"
>>> 
>>> On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel <
>> michael_segel@hotmail.com>wrote:
>>> 
>>>> Alternatively you could write a simple script to convert the csv to a
>> pipe
>>>> delimited file so that "abc,def" will be abc,def.
>>>> 
>>>> On Jun 26, 2012, at 2:51 PM, Harsh J wrote:
>>>> 
>>>>> Hive's delimited-fields-format record reader does not handle quoted
>>>>> text that carry the same delimiter within them. Excel supports such
>>>>> records, so it reads it fine.
>>>>> 
>>>>> You will need to create your table with a custom InputFormat class
>>>>> that can handle this (Try using OpenCSV readers, they support this),
>>>>> instead of relying on Hive to do this for you. If you're successful in
>>>>> your approach, please also consider contributing something back to
>>>>> Hive/Pig to help others.
>>>>> 
>>>>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P
>>>>> <sandeepreddy.3647@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> Hi all,
>>>>>> I have a csv file with 46 columns but i'm getting error when i do
some
>>>>>> analysis on that data type. For simplification i have taken 3 columns
>>>> and
>>>>>> now my csv is like
>>>>>> c,zxy,xyz
>>>>>> d,"abc,def",abcd
>>>>>> 
>>>>>> i have created table for this data using,
>>>>>> hive> create table test3(
>>>>>>> f1 string,
>>>>>>> f2 string,
>>>>>>> f3 string)
>>>>>>> row format delimited
>>>>>>> fields terminated by ",";
>>>>>> OK
>>>>>> Time taken: 0.143 seconds
>>>>>> hive> load data local inpath '/home/training/a.csv'
>>>>>>> into table test3;
>>>>>> Copying data from file:/home/training/a.csv
>>>>>> Copying file: file:/home/training/a.csv
>>>>>> Loading data to table default.test3
>>>>>> OK
>>>>>> Time taken: 0.276 seconds
>>>>>> hive> select * from test3;
>>>>>> OK
>>>>>> c       zxy     xyz
>>>>>> d       "abc    def"
>>>>>> Time taken: 0.156 seconds
>>>>>> 
>>>>>> When i do select f2 from test3;
>>>>>> my results are,
>>>>>> OK
>>>>>> zxy
>>>>>> "abc
>>>>>> but this should be abc,def
>>>>>> When i open the same csv file with Microsoft Excel i got abc,def
>>>>>> How should i solve this error??
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Thanks,
>>>>>> sandeep
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Harsh J
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Thanks,
>>> sandeep
>> 
> 
> 
> 
> -- 
> Thanks,
> sandeep

Mime
View raw message