cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robbie Strickland (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4208) ColumnFamilyOutputFormat should support writing to multiple column families
Date Fri, 21 Sep 2012 20:19:07 GMT


Robbie Strickland commented on CASSANDRA-4208:

[~mkjellman] your usage is correct.  What this patch does is actually change the ConfigHelper
so set/getColumnFamily() operates on the mapreduce.output.basename key that MultipleOutputs
(and FileInput/OutputFormat) uses when it's looking for outputs.  This is a bit hacky but
unavoidable since methods to alter this through the Hadoop API are inaccessible.  I have a
related ticket on the Hadoop side to change this and make it more generic, but until then
this will have to do. 
> ColumnFamilyOutputFormat should support writing to multiple column families
> ---------------------------------------------------------------------------
>                 Key: CASSANDRA-4208
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.0
>            Reporter: Robbie Strickland
>         Attachments: cassandra-1.1-4208.txt, cassandra-1.1-4208-v2.txt, cassandra-1.1-4208-v3.txt,
cassandra-1.1-4208-v4.txt, trunk-4208.txt, trunk-4208-v2.txt
> It is not currently possible to output records to more than one column family in a single
reducer.  Considering that writing values to Cassandra often involves multiple column families
(i.e. updating your index when you insert a new value), this seems overly restrictive.  I
am submitting a patch that moves the specification of column family from the job configuration
to the write() call in ColumnFamilyRecordWriter.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message