hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy...@yahoo.com>
Subject Hive queries not retrieving any values from LZO compressed tables
Date Thu, 29 Sep 2011 16:50:57 GMT
Hi Experts
            I'm currently struck when using compression with hive. My requirement
is I need to store data in hive tables in LZO compressed format and should be able to retrieve
the same successfully. However with the hive wiki and mailing list archives I was able to
succeed only on the first part 'Storing data in LZO compressed format in hive tables' but
failing with the second part 'Retrieving data back from the table'.

I have a non partitioned table that holds uncompressed text data and want to load this data
into a partitioned table in LZO format. I used Hive Dynamic Partition concept to populate
data into partitions for the same. On execution of the hive QL  data is loaded in compressed
format on target table, the hive CLI showed the no of rows loaded which was correct. Also
I verified the directory that corresponded to the table in hdfs, there also I could find sub
folders and files within with .deflate extension in sub folders. But when I fire any query
on this table I can see the map reduce job executing and showing an OK finally but No results
are returned under any conditions. It is the same with a simple 'Select *' query as well,
Just  an OK message with no rows returned. On the hive CLI I enabled the following parameters.

//for output compress
set hive.exec.compress.output=true;

//for dynamic partitions

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

I have posted below the queries i have executed from my end. Could any one please guide me
on how you have actually implemented LZO on your hive table  or help me out in crossing this
hurdle? 


Thanks in Advance

Source table

CREATE EXTERNAL TABLE bejoy_test_employee(eemployeeId Int, name String,codeId Int,country
String )
COMMENT 'test table for LZO compression' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/u/bejoy/external_tables/bejoy_test_employee' ;

Target Table

CREATE EXTERNAL TABLE bejoy_test_employee_partn(employeeId Int, name String,codeId Int)
COMMENT 'test table for LZO compression' 
PARTITIONED BY(country String) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/u/bejoy/external_tables/bejoy_test_employee_partn' ;

Query used for Dynamic Partitions
INSERT OVERWRITE TABLE bejoy_test_employee_partn PARTITION(country)
SELECT eemployeeId,name,codeId,country FROM bejoy_test_employee;


//Hive CLI log once the Dynamic Partitions query is executed
Partition default.bejoy_test_employee_partn{country=Australia} stats: [num_files: 1, num_rows:
0, total_size: 62114]
Partition default.bejoy_test_employee_partn{country=Canada} stats: [num_files: 1, num_rows:
0, total_size: 62327]
Partition default.bejoy_test_employee_partn{country=France} stats: [num_files: 1, num_rows:
0, total_size: 62377]
Partition default.bejoy_test_employee_partn{country=Germany} stats: [num_files: 1, num_rows:
0, total_size: 61667]
Partition default.bejoy_test_employee_partn{country=India} stats: [num_files: 1, num_rows:
0, total_size: 62341]
Partition default.bejoy_test_employee_partn{country=Italy} stats: [num_files: 1, num_rows:
0, total_size: 62463]
Partition default.bejoy_test_employee_partn{country=Japan} stats: [num_files: 1, num_rows:
0, total_size: 62201]
Partition default.bejoy_test_employee_partn{country=Middle-East} stats: [num_files: 1, num_rows:
0, total_size: 62360]
Partition default.bejoy_test_employee_partn{country=UK} stats: [num_files: 1, num_rows: 0,
total_size: 62334]
Partition default.bejoy_test_employee_partn{country=USA} stats: [num_files: 1, num_rows: 0,
total_size: 62191]
Table default.bejoy_test_employee_partn stats: [num_partitions: 10, num_files: 10, num_rows:
0, total_size: 622375]
101040 Rows loaded to bejoy_test_employee_partn
OK
Time taken: 15.576 seconds


//hive CLI log once queries are fired on the table
hive> select * from  bejoy_test_employee_partn;
OK
Time taken: 0.074 seconds

Regards
Bejoy.K.S
Mime
View raw message