hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: Using space as field separator fails. How do I fix this?
Date Tue, 05 Apr 2011 04:17:53 GMT
Try using octal, I.e. '\040'.

On Apr 4, 2011, at 8:21 PM, hadoopman <hadoopman@gmail.com> wrote:

> I had a similar problem though my logs were terminated with carriage return.  Many of
the fields in my logs are deliminated with a space.  We tried using \s but that basically
removed every instance of the letter s (yeah I thought that was amusing too).  In some cases
we were able to do a \\t but that didn't seem to work with our logs very well.  We are using
the regex SerDe and using a regex deliminator we hand built to make it work.  So far so good.
 Perhaps this is where you need to go.  I'm still learning how that works myself.  Exciting
Stuff!!
> 
> 
> 
> On 04/04/2011 03:50 AM, Bjørn Remseth wrote:
>> Hi guys
>> 
>> I'm having a problem:  I'm reading a file where fields are terminated
>> by space (' ', ascii 32) into a table.  I'm not making these files
>> so I can't easily change this use of ' ' as field separator.
>> 
>> DROP TABLE logdata;
>> 
>> CREATE EXTERNAL TABLE logdata(
>>       xxx STRING,
>>       yyy STRING,
>>       ...
>>       z_t)
>>   ROW FORMAT DELIMITED
>>   FIELDS TERMINATED BY ' '
>>   STORED AS TEXTFILE;
>> 
>> LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
>> TABLE logdata;
>> 
>> 
>> This fails: All the data is read into the first field (xxx).  If I
>> change the field separator to something else, e.g. "," things work
>> normally and I get to read the fields into their proper places
>> in the record, but then I have to edit the datafiles first and I don't
>> really want to do that.
>> 
>> Do you know how I can most easily read my logfiles?
>> 
>> Bjørn
>> 
>> 
>> 
>>   
> 

Mime
View raw message