Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Wed, 9 Jul 2014 18:23:05 +0000 (UTC)
From: "Paul Pak (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12703604.1395782677695.10015.1404930185178@arcas>
In-Reply-To: <JIRA.12703604.1395782677695@arcas>
References: <JIRA.12703604.1395782677695@arcas>
Subject: [jira] [Comment Edited] (CASSANDRA-6927) Create a CQL3 based bulk
 OutputFormat
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056557#comment-14056557 ] 

Paul Pak edited comment on CASSANDRA-6927 at 7/9/14 6:22 PM:
-------------------------------------------------------------

[~alexliu68] Hi Alex, thanks for your input. The fact that Hadoop properties aren't naturally specific to a column family is precisely the reason for not having generic schema/insertStatement properties and expecting them to apply to a particular column family, even if you happen to be working with only one column family. If some property value only applies to a specific column family, why not indicate it as such in the property key? It's certainly clearer and safer.

Also, what would be the benefit of having overloaded set/getColumnFamily* methods? They require additional validations to ensure the proper ones were used for the appropriate scenario, as opposed to having unambiguous ones that don't require any validation and work in all cases. The only possible benefit I can see is if there was a case where a column family was either unknown or not applicable, but that will never be the case with these schema/insertStatement properties.

In general, though, I prefer an approach where one solution works in all scenarios over one that entails variations of settings/methods that apply differently in different scenarios. It adds unnecessary complexity without any benefits and is prone to user confusion, misuse, and error.


was (Author: sixpak32577):
[~alexliu68] Hi Alex, thanks for your input. The fact that Hadoop properties aren't naturally specific to a column family is precisely the reason for not having generic schema/insertStatement properties and expecting them to apply to a particular column family, even if you happen to be working with only one column family. If some property value only applies to a specific column family, why not indicate it as such in the property key? It's certainly clearer and safer.

Also, what would be the benefit of having overloaded set/getColumnFamily* methods? They require additional validations to ensure the proper ones were used for the appropriate scenario, as opposed to having unambiguous ones that don't require any validation and work in all cases. The only possible benefit I can see is if there was a case where a column family was either unknown or not applicable, but that will never be the case with these schema/insertStatements properties.

In general, I prefer an approach where one solution works in all scenarios over one that entails variations of settings/methods that apply differently in different scenarios. It's adds unnecessary complexity without any benefits and is prone to user confusion, misuse, and error.

> Create a CQL3 based bulk OutputFormat
> -------------------------------------
>
>                 Key: CASSANDRA-6927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Paul Pak
>            Priority: Minor
>              Labels: cql3, hadoop
>         Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt
>
>
> This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift.


--
This message was sent by Atlassian JIRA
(v6.2#6252)