cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Pak (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
Date Tue, 26 Aug 2014 15:53:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110861#comment-14110861
] 

Paul Pak commented on CASSANDRA-7827:
-------------------------------------

Usage example:
{code}
String cf = "my_cf";  // underscores are not valid for MultipleOutputs output names
String alias = "myCfAlias";

// set properties for CqlBulkOutputFormat as usual
CqlBulkOutputFormat.setColumnFamilySchema(conf, cf, "CREATE TABLE my_cf ...");
CqlBulkOutputFormat.setColumnFamilyInsertStatement(conf, cf, "INSERT INTO my_cf...");
// set the alias
CqlBulkOutputFormat.setColumnFamilyAlias(conf, alias, cf);
// interactions with MultipleOutputs should be done using the alias
MultipleOutputs.addNamedOutput(job, alias, CqlBulkOutputFormat.class, Object.class, List.class);

...

// again, interactions with MultipleOutputs should be done using the alias, so...
multipleOutputs.write(alias, null, byteBufferList);

{code}

> Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
> -------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7827
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Paul Pak
>            Assignee: Paul Pak
>            Priority: Minor
>              Labels: cql3, hadoop
>         Attachments: trunk-7827-v1.txt
>
>
> When using MultipleOutputs with CqlBulkOutputFormat, the column family names to output
to are restricted to only alphanumeric characters due to the logic found in MultipleOutputs.checkNamedOutputName().
This will provide a way to alias any column family name to a MultipleOutputs compatible output
name, so that column family names won't be artificially restricted when using MultipleOutputs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message