cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Pak (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
Date Wed, 09 Jul 2014 18:23:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056557#comment-14056557
] 

Paul Pak edited comment on CASSANDRA-6927 at 7/9/14 6:22 PM:
-------------------------------------------------------------

[~alexliu68] Hi Alex, thanks for your input. The fact that Hadoop properties aren't naturally
specific to a column family is precisely the reason for not having generic schema/insertStatement
properties and expecting them to apply to a particular column family, even if you happen to
be working with only one column family. If some property value only applies to a specific
column family, why not indicate it as such in the property key? It's certainly clearer and
safer.

Also, what would be the benefit of having overloaded set/getColumnFamily* methods? They require
additional validations to ensure the proper ones were used for the appropriate scenario, as
opposed to having unambiguous ones that don't require any validation and work in all cases.
The only possible benefit I can see is if there was a case where a column family was either
unknown or not applicable, but that will never be the case with these schema/insertStatement
properties.

In general, though, I prefer an approach where one solution works in all scenarios over one
that entails variations of settings/methods that apply differently in different scenarios.
It adds unnecessary complexity without any benefits and is prone to user confusion, misuse,
and error.


was (Author: sixpak32577):
[~alexliu68] Hi Alex, thanks for your input. The fact that Hadoop properties aren't naturally
specific to a column family is precisely the reason for not having generic schema/insertStatement
properties and expecting them to apply to a particular column family, even if you happen to
be working with only one column family. If some property value only applies to a specific
column family, why not indicate it as such in the property key? It's certainly clearer and
safer.

Also, what would be the benefit of having overloaded set/getColumnFamily* methods? They require
additional validations to ensure the proper ones were used for the appropriate scenario, as
opposed to having unambiguous ones that don't require any validation and work in all cases.
The only possible benefit I can see is if there was a case where a column family was either
unknown or not applicable, but that will never be the case with these schema/insertStatements
properties.

In general, I prefer an approach where one solution works in all scenarios over one that entails
variations of settings/methods that apply differently in different scenarios. It's adds unnecessary
complexity without any benefits and is prone to user confusion, misuse, and error.

> Create a CQL3 based bulk OutputFormat
> -------------------------------------
>
>                 Key: CASSANDRA-6927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Paul Pak
>            Priority: Minor
>              Labels: cql3, hadoop
>         Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927.txt
>
>
> This is the CQL compatible version of BulkOutputFormat.  CqlOutputFormat exists, but
doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message