incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dinusha Dilrukshi <sdddilruk...@gmail.com>
Subject Issues with writing data to Cassandra column family using a Hive script
Date Sun, 10 Feb 2013 03:15:46 GMT
Hi All,

Data was originally stored in column family called "test_cf". Definition of
column family is as follows:

CREATE COLUMN FAMILY test_cf
WITH COMPARATOR = 'IntegerType'
 AND key_validation_class = UTF8Type
 AND default_validation_class = FloatType;

And, following is the sample data set that contains in "test_cf".

cqlsh:temp_ks> select * from test_cf;
 key            | column1    | value
------------------+----------------+-------
 localhost:8282 | 1350468600 |    76
 localhost:8282 | 1350468601 |    76


Hive script (shown in the end of mail) is use to take the data from above
column family "test_cf" and insert into a new column family
called "cpu_avg_5min_new7". Column family description
of "cpu_avg_5min_new7" is also same as the test_cf. Issue is, data written
in to "cpu_avg_5min_new7" column family after executing the hive script is
as follows. It's not in the format  of data present in the original column
family "test_cf". Any explanations would highly appreciate..


cqlsh:temp_ks> select * from cpu_avg_5min_new7;
 key            | column1                  | value
------------------+------------------------------+----------
 localhost:8282 | 232340574229062170849328 | 1.09e-05
 localhost:8282 | 232340574229062170849329 | 1.09e-05


Hive script:
----------------
drop table cpu_avg_5min_new7_hive;
CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING,
start_time INT, cpu_avg FLOAT) STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
SERDEPROPERTIES (
 "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
cassandra.ks.name" = "temp_ks" ,
 "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
 "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name"
= "cpu_avg_5min_new7" );

drop table xxx;
CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT,
cpu_avg FLOAT) STORED BY
 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
SERDEPROPERTIES (
 "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
cassandra.ks.name" = "temp_ks" ,
  "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
   "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name"
= "test_cf" );

insert overwrite table cpu_avg_5min_new7_hive select
src_id,start_time,cpu_avg from xxx;

Regards,
Dinusha.

Mime
View raw message