hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hadoopman <hadoop...@gmail.com>
Subject Re: Using space as field separator fails. How do I fix this?
Date Tue, 05 Apr 2011 05:23:16 GMT
Great tip.  I'll give it a try.

Thanks!


On 04/04/2011 10:17 PM, Alex Kozlov wrote:
> Try using octal, I.e. '\040'.
>
> On Apr 4, 2011, at 8:21 PM, hadoopman<hadoopman@gmail.com>  wrote:
>
>    
>> I had a similar problem though my logs were terminated with carriage return.  Many
of the fields in my logs are deliminated with a space.  We tried using \s but that basically
removed every instance of the letter s (yeah I thought that was amusing too).  In some cases
we were able to do a \\t but that didn't seem to work with our logs very well.  We are using
the regex SerDe and using a regex deliminator we hand built to make it work.  So far so good.
 Perhaps this is where you need to go.  I'm still learning how that works myself.  Exciting
Stuff!!
>>
>>
>>
>> On 04/04/2011 03:50 AM, Bjørn Remseth wrote:
>>      
>>> Hi guys
>>>
>>> I'm having a problem:  I'm reading a file where fields are terminated
>>> by space (' ', ascii 32) into a table.  I'm not making these files
>>> so I can't easily change this use of ' ' as field separator.
>>>
>>> DROP TABLE logdata;
>>>
>>> CREATE EXTERNAL TABLE logdata(
>>>        xxx STRING,
>>>        yyy STRING,
>>>        ...
>>>        z_t)
>>>    ROW FORMAT DELIMITED
>>>    FIELDS TERMINATED BY ' '
>>>    STORED AS TEXTFILE;
>>>
>>> LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
>>> TABLE logdata;
>>>
>>>
>>> This fails: All the data is read into the first field (xxx).  If I
>>> change the field separator to something else, e.g. "," things work
>>> normally and I get to read the fields into their proper places
>>> in the record, but then I have to edit the datafiles first and I don't
>>> really want to do that.
>>>
>>> Do you know how I can most easily read my logfiles?
>>>
>>> Bjørn
>>>
>>>
>>>
>>>
>>>        
>>      
>    


Mime
View raw message