hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Warren <james.war...@stanfordalumni.org>
Subject Re: Hive queries not retrieving any values from LZO compressed tables
Date Fri, 30 Sep 2011 22:51:36 GMT
Hello Bejoy -

Even though you defined your target table by specifying the input format,
you still have to ensure the output of your population query is compressed.
 Make sure the following are set before executing your "insert overwrite
table " query.

set mapred.output.compress=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
set hive.exec.compress.output=true;

(This is assuming lzo is installed on your cluster; see
https://github.com/toddlipcon/hadoop-lzo if it's not.)

cheers,
-James


On Thu, Sep 29, 2011 at 9:50 AM, Bejoy Ks <bejoy_ks@yahoo.com> wrote:

> Hi Experts
>             I'm currently struck when using compression with hive. My
> requirement is I need to store data in hive tables in LZO compressed format
> and should be able to retrieve the same successfully. However with the hive
> wiki and mailing list archives I was able to succeed only on the first part
> 'Storing data in LZO compressed format in hive tables' but failing with the
> second part 'Retrieving data back from the table'.
>
> I have a non partitioned table that holds uncompressed text data and want
> to load this data into a partitioned table in LZO format. I used Hive
> Dynamic Partition concept to populate data into partitions for the same. On
> execution of the hive QL  data is loaded in compressed format on target
> table, the hive CLI showed the no of rows loaded which was correct. Also I
> verified the directory that corresponded to the table in hdfs, there also I
> could find sub folders and files within with .deflate extension in sub
> folders. But when I fire any query on this table I can see the map reduce
> job executing and showing an OK finally but No results are returned under
> any conditions. It is the same with a simple 'Select *' query as well, Just
> an OK message with no rows returned. On the hive CLI I enabled the following
> parameters.
>
> //for output compress
> set hive.exec.compress.output=true;
> //for dynamic partitions
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
>
> I have posted below the queries i have executed from my end. Could any one
> please guide me on how you have actually implemented LZO on your hive table
> or help me out in crossing this hurdle?
>
> Thanks in Advance
>
> Source table
> CREATE EXTERNAL TABLE bejoy_test_employee(eemployeeId Int, name
> String,codeId Int,country String )
> COMMENT 'test table for LZO compression'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/u/bejoy/external_tables/bejoy_test_employee' ;
>
> Target Table
> CREATE EXTERNAL TABLE bejoy_test_employee_partn(employeeId Int, name
> String,codeId Int)
> COMMENT 'test table for LZO compression'
> PARTITIONED BY(country String)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
> LOCATION '/u/bejoy/external_tables/bejoy_test_employee_partn' ;
>
> Query used for Dynamic Partitions
> INSERT OVERWRITE TABLE bejoy_test_employee_partn PARTITION(country)
> SELECT eemployeeId,name,codeId,country FROM bejoy_test_employee;
>
>
> //Hive CLI log once the Dynamic Partitions query is executed
> Partition default.bejoy_test_employee_partn{country=Australia} stats:
> [num_files: 1, num_rows: 0, total_size: 62114]
> Partition default.bejoy_test_employee_partn{country=Canada} stats:
> [num_files: 1, num_rows: 0, total_size: 62327]
> Partition default.bejoy_test_employee_partn{country=France} stats:
> [num_files: 1, num_rows: 0, total_size: 62377]
> Partition default.bejoy_test_employee_partn{country=Germany} stats:
> [num_files: 1, num_rows: 0, total_size: 61667]
> Partition default.bejoy_test_employee_partn{country=India} stats:
> [num_files: 1, num_rows: 0, total_size: 62341]
> Partition default.bejoy_test_employee_partn{country=Italy} stats:
> [num_files: 1, num_rows: 0, total_size: 62463]
> Partition default.bejoy_test_employee_partn{country=Japan} stats:
> [num_files: 1, num_rows: 0, total_size: 62201]
> Partition default.bejoy_test_employee_partn{country=Middle-East} stats:
> [num_files: 1, num_rows: 0, total_size: 62360]
> Partition default.bejoy_test_employee_partn{country=UK} stats: [num_files:
> 1, num_rows: 0, total_size: 62334]
> Partition default.bejoy_test_employee_partn{country=USA} stats: [num_files:
> 1, num_rows: 0, total_size: 62191]
> Table default.bejoy_test_employee_partn stats: [num_partitions: 10,
> num_files: 10, num_rows: 0, total_size: 622375]
> 101040 Rows loaded to bejoy_test_employee_partn
> OK
> Time taken: 15.576 seconds
>
> //hive CLI log once queries are fired on the table
> hive> select * from  bejoy_test_employee_partn;
> OK
> Time taken: 0.074 seconds
>
> Regards
> Bejoy.K.S
>
>

Mime
View raw message