incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lumby <johnlu...@hotmail.com>
Subject RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
Date Wed, 09 Oct 2013 22:33:13 GMT
----------------------------------------
> From: johnlumby@hotmail.com
> To: user@cassandra.apache.org
> Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text
type?
> Date: Wed, 9 Oct 2013 09:40:06 -0400
>
>     software versions : apache-cassandra-2.0.1    hadoop-2.1.0-beta
>
> I have been experimenting with using hadoop for a map/reduce operation on cassandra,
> outputting to the CqlOutputFormat.class.
> I based my first program fairly closely on the famous WordCount example in
> examples/hadoop_cql3_word_count
> except --- I set my output colfamily to have a bigint primary key :
>
> CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid))
>
> and simply tried setting this key as one of the keys in the output map :
>
>          keys.put("recordid", ByteBufferUtil.bytes(recordid.longValue()));
>
> but it always failed with a strange error :
>
> java.io.IOException: InvalidRequestException(why:Key may not be empty)
>
I managed to get a little bit further and my M/R program now runs to completion
with output to the colfamily with bigint primary key and actually does manage
to UPDATE a row.

query:

     String query = "UPDATE " + keyspace + "." + OUTPUT_COLUMN_FAMILY + " SET count_num
= ? ";

reduce method :

        public void reduce(LongWritable writableRecid, Iterable<LongWritable>
values, Context context) throws IOException, InterruptedException
        {
            Long sum = 0L;
            Long recordid = writableRecid.get();
            List<ByteBuffer> vbles = null;
            byte[] longByterray = new byte[8];
            for(int i= 0; i < 8; i++) {
                longByterray[i] = (byte)(recordid>> (i * 8));
            }  
            ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8);
            recordIdByteBuf.wrap(longByterray);
            keys.put("recordid", recordIdByteBuf);
                      ...
            context.write(keys, vbles);
        }

and my logger output does show it outputting maps containing
what appear to be valid keys e.g.

writing key : 0x4700000000407826 , hasarray ? : Y

there are about 74 mappings in the final reducer output,
each with a different numeric record key.

but after the program completes,   there is just one single row in the columnfamily
with a rowkey of 0 (zero).

SELECT * FROM archive_recordids LIMIT 999999999;

 recordid | count_num
----------+-----------
        0 |         2

(1 rows)


I guess it is something relating to the way my code is wrapping along value into the ByteBuffer
or maybe the way the ByteBuffer is being allocated.    As far as I can tell,
the ByteBuffer needs to be populated in exactly the same way as a thrift application
would populate a ByteBuffer for a bigint key  --   does anyone know how to do that
or point me to an example that works?

Thanks   John


>
> Cheers,   John 		 	   		  
Mime
View raw message