cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Lee <e...@c11software.com>
Subject Re: Cassandra and Pig - how to get column values?
Date Sat, 16 Oct 2010 20:55:47 GMT
I have this working now with the following:

rows = LOAD 'cassandra://TwitterExample/User' using CassandraStorage();
*cols = FOREACH rows GENERATE FLATTEN((bag{tuple(chararray,chararray)})$1);*
users = FOREACH cols GENERATE $1;

Not sure if that operation with *cols *is correct or not, but it appears to
be working. Any thoughts would be appreciated.

Eric.

On Fri, Oct 15, 2010 at 8:02 PM, Eric Lee <eric@c11software.com> wrote:

> Hey guys,
>
> I'm having a problem with pig and cassandra and was hoping someone could
> point me in the right direction. I've setup Pig and Cassandra and I'm able
> to run through the example shown in the README.txt - I can view a list of
> top column names. That's all good stuff.
>
> What I would like to do next is just dump out the column values. Suppose I
> have a very simple Column Family called User. To that column family, I've
> added 2 rows of data, each row just has 1 column 'userName'. I'm using a
> GUID as my key.
>
> When I load and dump my rows, I get some data like:
>
> (6c7fef29-16dd-44ca-bde1-f53995b2e818,{(userName,someUserName1)})
> (8be0b934-45aa-444f-90e2-ce7137a73b68,{(userName,someUserName2})
> (c51fc8ce-2a53-46bb-b872-0f644b972f62,{(userName,someUserName3)})
>
> As I understand it, at this point, the GUID is $0 and $1 is the bag that
> contains my columns.
>
> So, like in the README, I run:
>
> cols = FOREACH rows GENERATE flatten($1);
>
> As I understand it, when I flatten a bag, I get a set of tuples. When I
> dump cols, I get the following:
>
> (userName,someUserName1)
> (userName,someUserName2)
> (userName,someUserName3)
>
> If I continue with the README, I would run colnames = FOREACH cols GENERATE
> $0 to give me the column names.
>
> I'm a little confused why I only get column names - when I do a describe on
> cols, I get the following:
>
> cols: {bytearray}
>
> It seems like $0 should be the entire line (userName,someUserName1), not
> just the column name.
>
> Anyways, what I really what is the column value, not the name. Is there a
> way to do that? I listed all of the failed attempts I made below.
>
>    - colnames = FOREACH cols GENERATE $1 and was told $1 was out of
>    bounds.
>    - casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0; but
>    all I got back were empty tuples
>    - values = FOREACH cols GENERATE $0.$1; but I got an error telling me
>    data byte array can't be casted to tuple
>
> So I'm stuck - any help would be greatly appreciated.
>
> Thanks!
>
> Eric.
>
>
>
>


-- 
WonderAffect

http://www.wonderaffect.com
http://www.wonderaffect.com/blog

Mime
View raw message