hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Martin <andrew.mar...@billforward.net>
Subject Output to Cassandra table with column of type "map"
Date Tue, 03 Jun 2014 16:53:57 GMT
I've written a number of MapReduce jobs using the CQL3 driver that allows input/output from/to
Cassandra column families.

The output from the Reducer has always a been a Map<String, ByteBuffer> for the primary
key(s) and a List<ByteBuffer> for the values. This works fine for all data types that
can be converted easily to a ByteBuffer with "org.apache.cassandra.utils.ByteBufferUtil.bytes()",
namely double, float, int, String, etc.

Now I'd like to output data to a column in Cassandra that has the datatype "map", but I'm
not sure if I should still pass it as an item in the List of ByteBuffers and, if so, how I'd
correctly cast it to a bunch of bytes.

My problem is like the traditional WordCount problem, only I need to output more than one
bit of data about the words (imagine I was storing, for each word, the number of times it
appeared in the text, the average length of the sentences it appears in, and the date of publication
of the oldest text it appears in). I can conceive of a solution with more than one column
family, but Cassandra appears to provide the map datatype to avoid this.

Is there a way to output to a Cassandra column of datatype Map, or a way to avoid having to
do so?



View raw message