hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sal Scalisi <sal...@hotmail.com>
Subject Hive Loading Nulls When Using RegEx
Date Fri, 01 Jul 2011 22:52:21 GMT
I'm new to hive and I'm having an issue loading a simple set of data via 
regex.

I have a data file called test.txt that contains the following:

TESTONE-1
TESTTWO-2
TESTTHREE-3
TESTFOUR-4
TESTFIVE-5

I have this hive script:

hive> CREATE TABLE test
 > (
 >  field_1 STRING
 > )
 > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
 > WITH SERDEPROPERTIES
 > (
 >  "input.regex" = "([^ ]*)",
 >  "output.regex" = "%1$s"
 > )
 > STORED AS TEXTFILE;
Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe
OK
Time taken: 0.064 seconds

hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
Copying data from file:/home/hadoop/test
Loading data to table test
OK
Time taken: 0.213 seconds

hive> SELECT * FROM test LIMIT 10;
OK
TESTONE-1
TESTTWO-2
TESTTHREE-3
TESTFOUR-4
TESTFIVE-5
Time taken: 0.153 seconds

Which produces the expected output.

When I alter the hive script to include two fields, I get all null values:

hive> CREATE TABLE test
 > (
 >  field_1 STRING,
 >  field_2 STRING
 > )
 > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
 > WITH SERDEPROPERTIES
 > (
 >  "input.regex" = "([a-z,A-Z]*)(-\d*)",
 >  "output.regex" = "%1$s %2$s"
 > )
 > STORED AS TEXTFILE;
Found class for org.apache.hadoop.hive.contrib.serde2.RegexSerDe
OK
Time taken: 0.025 seconds

hive> LOAD DATA LOCAL INPATH '/home/hadoop/test' OVERWRITE INTO TABLE test;
Copying data from file:/home/hadoop/test
Loading data to table test
OK
Time taken: 0.187 seconds

hive> SELECT * FROM test LIMIT 10;
OK
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
Time taken: 0.162 seconds

I've checked the regular expression against http://regexpal.com/ and it 
seems to check out.  I think there may be an issue with SerDe, but I 
don't know how to go about trouble shooting it.

I'm running this on Amazon's Elastic MapReduce

Any help is appreciated.

-Sal

Mime
View raw message