Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C8616E7E5 for ; Sun, 10 Feb 2013 03:16:16 +0000 (UTC) Received: (qmail 81868 invoked by uid 500); 10 Feb 2013 03:16:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81405 invoked by uid 500); 10 Feb 2013 03:16:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81374 invoked by uid 99); 10 Feb 2013 03:16:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Feb 2013 03:16:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sdddilrukshi@gmail.com designates 209.85.128.52 as permitted sender) Received: from [209.85.128.52] (HELO mail-qe0-f52.google.com) (209.85.128.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Feb 2013 03:16:07 +0000 Received: by mail-qe0-f52.google.com with SMTP id 6so2160504qeb.39 for ; Sat, 09 Feb 2013 19:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=xpPUCbxdePkZzkkUFBmi5Zqb/dt6FMSvdcv2r/MyuqU=; b=huLcXN4zys9zLVWh2/bU88g+bfhA3/qoBisuzK1mVZ00w30ecv4haujPMM00SX1Snx 0y2Mxz/Q58NfsLqjZ/LS19djPqFGpWcLy4ucgii7KL98SAcDf1o8BlvC88eRIi0v0tNh ouDByZHEAFPEQa0QLJqL4OCpIdz0nmShwvu2PcM+a5/rbWoYrA3gHM5Z/fH2Snv1ekTK 1qNvTRlSELzlQK9K3Gi3WoIvdIdOsEQxwfQTbthZ2Ovt8+grc/vN6Qta3+uKZ36SdEBQ jqTggs9zIxuFQoB0+GN+SrA3TtNe+qZusCWC2byuNd+Nfu1YTINEZBmZl5Hjht+uSv6+ 41xQ== MIME-Version: 1.0 X-Received: by 10.224.222.15 with SMTP id ie15mr3969971qab.75.1360466146717; Sat, 09 Feb 2013 19:15:46 -0800 (PST) Received: by 10.49.119.39 with HTTP; Sat, 9 Feb 2013 19:15:46 -0800 (PST) Date: Sun, 10 Feb 2013 08:45:46 +0530 Message-ID: Subject: Issues with writing data to Cassandra column family using a Hive script From: Dinusha Dilrukshi To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3074b51a676f8504d55635b7 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3074b51a676f8504d55635b7 Content-Type: text/plain; charset=ISO-8859-1 Hi All, Data was originally stored in column family called "test_cf". Definition of column family is as follows: CREATE COLUMN FAMILY test_cf WITH COMPARATOR = 'IntegerType' AND key_validation_class = UTF8Type AND default_validation_class = FloatType; And, following is the sample data set that contains in "test_cf". cqlsh:temp_ks> select * from test_cf; key | column1 | value ------------------+----------------+------- localhost:8282 | 1350468600 | 76 localhost:8282 | 1350468601 | 76 Hive script (shown in the end of mail) is use to take the data from above column family "test_cf" and insert into a new column family called "cpu_avg_5min_new7". Column family description of "cpu_avg_5min_new7" is also same as the test_cf. Issue is, data written in to "cpu_avg_5min_new7" column family after executing the hive script is as follows. It's not in the format of data present in the original column family "test_cf". Any explanations would highly appreciate.. cqlsh:temp_ks> select * from cpu_avg_5min_new7; key | column1 | value ------------------+------------------------------+---------- localhost:8282 | 232340574229062170849328 | 1.09e-05 localhost:8282 | 232340574229062170849329 | 1.09e-05 Hive script: ---------------- drop table cpu_avg_5min_new7_hive; CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , " cassandra.ks.name" = "temp_ks" , "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" , "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name" = "cpu_avg_5min_new7" ); drop table xxx; CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , " cassandra.ks.name" = "temp_ks" , "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" , "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name" = "test_cf" ); insert overwrite table cpu_avg_5min_new7_hive select src_id,start_time,cpu_avg from xxx; Regards, Dinusha. --20cf3074b51a676f8504d55635b7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi All,

Data was originally stored in column family call= ed=A0"test_cf". Definition of column family is as follows:

CREATE COLUMN FAMILY=A0test_cf=A0
WITH COMPARATOR =3D 'IntegerType'=20
=A0AND key_validation_class =3D UTF8Type
=A0AND default_validation_class =3D FloatType; =A0

And, following is the sample data set that contains in=A0"test_cf"= ;.

cqlsh:temp_ks> select * from test_cf;
=A0k= ey =A0 =A0 =A0 =A0 =A0 =A0| column1 =A0 =A0| value
--------------= ----+----------------+-------
=A0localhost:8282 | 1350468600 | = =A0 =A076
=A0localhost:8282 | 1350468601 | =A0 =A076


Hive script (shown in the end of mail) i= s use to take the data from above column family "test_cf" and ins= ert into a new column family called=A0"cpu_avg_5min_new7". Column= family description of=A0"cpu_avg_5min_new7" is also same as the= =A0test_cf. Issue is, data written in to=A0"cpu_avg_5min_new7" co= lumn family after executing the hive script is as follows. It's not in = the format=A0=A0of data present in the original column family "test_cf= ". Any explanations would highly appreciate..


cqlsh:temp_ks> select * from cpu= _avg_5min_new7;
=A0key =A0 =A0 =A0 =A0 =A0 =A0| column1 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0| value
------------------+-----------= -------------------+----------
=A0localhost:8282 | 232340574229062170849328 | 1.09e-05
=A0l= ocalhost:8282 | 232340574229062170849329 | 1.09e-05


Hive script:
----------------
drop table cpu_avg_5min_new7_hive;
CREATE EXTERNAL TABLE IF NOT E= XISTS cpu_avg_5min_new7_hive (src_id STRING, start_time INT, cpu_avg FLOAT)= STORED BY=A0
'org.apache.hadoop.hive.cassandra.CassandraStor= ageHandler' WITH SERDEPROPERTIES (
=A0"cassandra.host" =3D "127.0.0.1" , "cassan= dra.port" =3D "9160" , "cassandra.ks.name" =3D "temp_ks" ,=A0
= =A0"cassandra.ks.username" =3D "xxx" , "cassandra.= ks.password" =3D "xxx" ,=A0
=A0"cassandra.columns.mapping" =3D ":key,:column,:value= " , "cassandra.cf.name&q= uot; =3D "cpu_avg_5min_new7" );=A0

drop = table xxx;
CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT= , cpu_avg FLOAT) STORED BY
=A0'org.apache.hadoop.hive.cassand= ra.CassandraStorageHandler' WITH SERDEPROPERTIES (=A0
=A0&quo= t;cassandra.host" =3D "127.0.0.1" , "cassandra.port&quo= t; =3D "9160" , "cassan= dra.ks.name" =3D "temp_ks" ,
=A0=A0"cassandra.ks.username" =3D "xxx" , "ca= ssandra.ks.password" =3D "xxx" ,
=A0=A0 "cass= andra.columns.mapping" =3D ":key,:column,:value" , "cassandra.cf.name" =3D "tes= t_cf" );

insert overwrite table cpu_avg_5min_new7_hive select sr= c_id,start_time,cpu_avg from xxx;

Regards,
Dinusha.


--20cf3074b51a676f8504d55635b7--