Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33B3FD787 for ; Mon, 27 Aug 2012 11:10:29 +0000 (UTC) Received: (qmail 52099 invoked by uid 500); 27 Aug 2012 11:10:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52075 invoked by uid 500); 27 Aug 2012 11:10:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52047 invoked by uid 99); 27 Aug 2012 11:10:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 11:10:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of penela@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-wg0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 11:10:17 +0000 Received: by wgbdr13 with SMTP id dr13so2685189wgb.25 for ; Mon, 27 Aug 2012 04:09:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=F5qpVBmC7g340XDo1TLaFlzcZyU2zZZaL+oyImTeSc8=; b=ELTO6hQLL8y+dkuxVhPQno75NX40NxY4guQE3MmxFRjTple8CaDnf9MfLK7Wijj1QT 5PCDi8d+NV+HESdEHw9GAdHm+n5+DwVXmM/ZBkZhFka5pOabdZ9JnEwF2ZATUq7dlDaH 4mUvUJyguI1gSBR1esKv4dlLCuAKjGaiGlkaY7fmTLGq6XvcOXvXRJN/etkQLkL9mvM4 XUWiHDjY8NSHSjdGhI0BFdM2hZGJYTnIiczc06JYLmC9mCZNOyj5x62EYcz4K2vhPb+9 5npoT791RqUckGMR3bhabUx6fwjNdevsTLqqnPz9wJZRmLtvSru3WsFBQ5xZHgh6yt3G 0dew== MIME-Version: 1.0 Received: by 10.180.84.164 with SMTP id a4mr24827511wiz.12.1346065797149; Mon, 27 Aug 2012 04:09:57 -0700 (PDT) Received: by 10.223.78.206 with HTTP; Mon, 27 Aug 2012 04:09:57 -0700 (PDT) Date: Mon, 27 Aug 2012 13:09:57 +0200 Message-ID: Subject: Understanding Cassandra + MapReduce + Composite Columns From: =?ISO-8859-1?Q?V=EDctor_Penela?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d043c06e6aef59904c83d5d74 --f46d043c06e6aef59904c83d5d74 Content-Type: text/plain; charset=ISO-8859-1 Hi! I'm trying to use Hadoop's MapReduce on top of a Cassandra environment, and I've running into some issues while using Composite Columns. I'm currently using Cassandra 1.1.2 (I wouldn't mind having to update it) and Hadoop 1.0.3 (I'd rather keep this version). What I would like to do is send slices divided by the first key of the composite key, and do some processing, taking into account the rest of the elements of the composite key (as well as other columns). I've built a sandbox keyspace with some column families in order to test this: CREATE TABLE test_1 ( field1 text, field2 text, field3 text, field4 text, PRIMARY KEY (field1) ) ; CREATE TABLE test_2 ( field1 text, field2 text, field3 text, field4 text, PRIMARY KEY (field1, field2) ) ; The Job configuration (the relevant elements for Cassandra) is as follows: // Cassandra config ConfigHelper.setInputRpcPort(conf, "9160"); ConfigHelper.setInputInitialAddress(conf, "localhost"); ConfigHelper.setInputPartitioner(conf, "ByteOrderedPartitioner"); ConfigHelper.setInputColumnFamily(conf, KEYSPACE, INPUT_COLUMN_FAMILY); SlicePredicate predicate = new SlicePredicate(); predicate.setSlice_range(new SliceRange().setStart(ByteBufferUtil.EMPTY_BYTE_BUFFER).setFinish(ByteBufferUtil.EMPTY_BYTE_BUFFER).setCount(5)); ConfigHelper.setInputSlicePredicate(conf, predicate); My dummy maps tries only to log the different keys and values received. map(ByteBuffer key, SortedMap columns, OutputCollector output, Reporter reporter) { ... } With CF test_1 everything seems to work fine. With CF test_2, I only receive field1 value inside the ByteBuffer key. The rest of the composite key seems to be encoded into each key of the SortedMap with the particular key of that column (field3, field4, ...), but I don't know exactly how to extract it (I'm a bit new with ByteBuffers, so any help there will be welcome :)). Is there anyway to specify the schema of this particular CF at MR level, in order to be able to extract the secondary key? Thanks! --f46d043c06e6aef59904c83d5d74 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi!

I'm trying to use Hadoop's MapReduce on top = of a Cassandra environment, and I've running into some issues while usi= ng Composite Columns. I'm currently using Cassandra 1.1.2 (I wouldn'= ;t mind having to update it) and Hadoop 1.0.3 (I'd rather keep this ver= sion).

What I would like to do is send slices divided by the f= irst key of the composite key, and do some processing, taking into account = the rest of the elements of the composite key (as well as other columns).

I've built a sandbox keyspace with some column fami= lies in order to test this:

CREATE TABLE test= _1 (
=A0 field1 text,
=A0 field2 text,
=A0 fi= eld3 text,
=A0 field4 text,
=A0 PRIMARY KEY (field1)
) ;
CREATE TABLE test_2 (
=A0 field1 text,
=A0 field2= text,
=A0 field3 text,
=A0 field4 text,
=A0 = PRIMARY KEY (field1, field2)
) ;

The Job configuration (the relevant= elements for Cassandra) is as follows:
// Cassandra config<= /div>
ConfigHelper.setInputRpcPort(conf, "9160");
ConfigHelper.setInputInitialAddress(conf, "localhost");
ConfigHelper.setInputPartitioner(conf, "ByteOrderedPartitioner")= ;
ConfigHelper.setInputColumnFamily(conf, KEYSPACE, INPUT_COLUMN_= FAMILY);

SlicePredicate predicate =3D new SlicePredicate();
predicate.setSlice_range(new SliceRange().setStart(ByteBufferUtil.EMP= TY_BYTE_BUFFER).setFinish(ByteBufferUtil.EMPTY_BYTE_BUFFER).setCount(5));
ConfigHelper.setInputSlicePredicate(conf, predicate);
=
My dummy maps tries only to log the different keys and value= s received.
map(ByteBuffer key, SortedMap<ByteBuffer, IColumn&= gt; columns, OutputCollector<Text, IntWritable> output, =A0Reporter r= eporter) { ... }

With CF test_1 everything seems to work fine.

With CF test_2, I only receive field1 value inside the Byte= Buffer key. The rest of the composite key seems to be encoded into each key= of the SortedMap with the particular key of that column (field3, field4, .= ..), but I don't know exactly how to extract it (I'm a bit new with= ByteBuffers, so any help there will be welcome :)). Is there anyway to spe= cify the schema of this particular CF at MR level, in order to be able to e= xtract the secondary key?

Thanks!
--f46d043c06e6aef59904c83d5d74--