cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <>
Subject prep for cassandra storage from pig
Date Wed, 15 Jun 2011 18:17:02 GMT
I think I'm stuck on typing issues trying to store data in cassandra.  To
verify, cassandra wants (key, {tuples})

My pig script is fairly brief:
raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
(key:chararray, columns:bag {column:tuple (name, value)});
--colums == timeUUID -> JSON
rows = FOREACH raw GENERATE key, FLATTEN(columns);
alias_target_day = FOREACH rows {
    --I wrote a specialized parser that does exactly what I need
    observation_map = com.civicscience.pig.ParseObservation($2);
    GENERATE $0 as alias, observation_map#'_fqt' as target,
observation_map#'_day' as day;
grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day);
X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1,
COUNT($1)) as day_count;

This gets me:
(targetA, (day1, count))
(targetA, (day2, count))
(targetB, (day1, count))

But, cassandra wants the 2nd item to be a bag.  So, I tried:
X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1,
COUNT($1))) as day_count;

But this results in:
(targetA, {((day1, count))})
(targetA, {((day2, count))})
(targetB, {((day1, count))})
It's hard to see, but the 2nd item now has a nested tuple as the first
value, which is still bad.

How to I get (key, {tuple})???  I wasn't sure where to post this (pig or
cassandra), so I'm posting to the pig list too.


View raw message