incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From VĂ­ctor Penela <pen...@gmail.com>
Subject Understanding Cassandra + MapReduce + Composite Columns
Date Mon, 27 Aug 2012 11:09:57 GMT
Hi!

I'm trying to use Hadoop's MapReduce on top of a Cassandra environment, and
I've running into some issues while using Composite Columns. I'm currently
using Cassandra 1.1.2 (I wouldn't mind having to update it) and Hadoop
1.0.3 (I'd rather keep this version).

What I would like to do is send slices divided by the first key of the
composite key, and do some processing, taking into account the rest of the
elements of the composite key (as well as other columns).

I've built a sandbox keyspace with some column families in order to test
this:

CREATE TABLE test_1 (
  field1 text,
  field2 text,
  field3 text,
  field4 text,
  PRIMARY KEY (field1)
) ;
CREATE TABLE test_2 (
  field1 text,
  field2 text,
  field3 text,
  field4 text,
  PRIMARY KEY (field1, field2)
) ;

The Job configuration (the relevant elements for Cassandra) is as follows:
// Cassandra config
ConfigHelper.setInputRpcPort(conf, "9160");
ConfigHelper.setInputInitialAddress(conf, "localhost");
ConfigHelper.setInputPartitioner(conf, "ByteOrderedPartitioner");
ConfigHelper.setInputColumnFamily(conf, KEYSPACE, INPUT_COLUMN_FAMILY);

SlicePredicate predicate = new SlicePredicate();
predicate.setSlice_range(new
SliceRange().setStart(ByteBufferUtil.EMPTY_BYTE_BUFFER).setFinish(ByteBufferUtil.EMPTY_BYTE_BUFFER).setCount(5));
ConfigHelper.setInputSlicePredicate(conf, predicate);

My dummy maps tries only to log the different keys and values received.
map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
OutputCollector<Text, IntWritable> output,  Reporter reporter) { ... }

With CF test_1 everything seems to work fine.

With CF test_2, I only receive field1 value inside the ByteBuffer key. The
rest of the composite key seems to be encoded into each key of the
SortedMap with the particular key of that column (field3, field4, ...), but
I don't know exactly how to extract it (I'm a bit new with ByteBuffers, so
any help there will be welcome :)). Is there anyway to specify the schema
of this particular CF at MR level, in order to be able to extract the
secondary key?

Thanks!

Mime
View raw message