hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yichuan Hu <huyich...@gmail.com>
Subject Re: Hive Loading Nulls When Using RegEx
Date Fri, 01 Jul 2011 23:25:48 GMT
Use \\d instead of \d.

On Jul 1, 2011, at 6:52 PM, Sal Scalisi <sals99@hotmail.com> wrote:

> I'm new to hive and I'm having an issue loading a simple set of data via regex. 
> 
> I have a data file called test.txt that contains the following: 
> 
> TESTONE-1 
> TESTTWO-2 
> TESTTHREE-3 
> TESTFOUR-4 
> TESTFIVE-5 
> 
> I have this hive script: 
> 
> hive> CREATE TABLE test 
> > ( 
> >  field_1 STRING 
> > ) 
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
> > WITH SERDEPROPERTIES 
> > ( 
> >  "input.regex" = "([^ ]*)", 
> >  "output.regex" = "%1$s" 
> > ) 
> > STORED AS TEXTFILE; 
> Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe 
> OK 
> Time taken: 0.064 seconds 
> 
> hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test; 
> Copying data from file:/home/hadoop/test 
> Loading data to table test 
> OK 
> Time taken: 0.213 seconds 
> 
> hive> SELECT * FROM test LIMIT 10; 
> OK 
> TESTONE-1 
> TESTTWO-2 
> TESTTHREE-3 
> TESTFOUR-4 
> TESTFIVE-5 
> Time taken: 0.153 seconds 
> 
> Which produces the expected output. 
> 
> When I alter the hive script to include two fields, I get all null values: 
> 
> hive> CREATE TABLE test 
> > ( 
> >  field_1 STRING, 
> >  field_2 STRING 
> > ) 
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
> > WITH SERDEPROPERTIES 
> > ( 
> >  "input.regex" = "([a-z,A-Z]*)(-\d*)", 
> >  "output.regex" = "%1$s %2$s" 
> > ) 
> > STORED AS TEXTFILE; 
> Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe 
> OK 
> Time taken: 0.025 seconds 
> 
> hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test; 
> Copying data from file:/home/hadoop/test 
> Loading data to table test 
> OK 
> Time taken: 0.187 seconds 
> 
> hive> SELECT * FROM test LIMIT 10; 
> OK 
> NULL    NULL 
> NULL    NULL 
> NULL    NULL 
> NULL    NULL 
> NULL    NULL 
> Time taken: 0.162 seconds 
> 
> I've checked the regular expression against http://regexpal.com/ and it seems to check
out.  I think there may be an issue with SerDe, but I don't know how to go about trouble shooting
it. 
> 
> I'm running this on Amazon's Elastic MapReduce 
> 
> Any help is appreciated. 
> 
> -Sal 

Mime
View raw message