cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7410) Pig support for BulkOutputFormat as a parameter in url
Date Wed, 15 Apr 2015 17:19:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496547#comment-14496547
] 

Piotr Kołaczkowski commented on CASSANDRA-7410:
-----------------------------------------------

org/apache/cassandra/hadoop/pig/CqlNativeStorage.java:149
{noformat}
 	if (t.getType(0) == DataType.TUPLE)
        {
            if (bulkOutputFormat)
            {
                cqlQueryFromTuple(null, t, 0);
            }
            else if (t.getType(1) == DataType.TUPLE)
            {
                Map<String, ByteBuffer> key = tupleToKeyMap((Tuple)t.get(0));
                cqlQueryFromTuple(key, t, 1);
            }
            else
                throw new IOException("Second argument in output must be a tuple");
        }
        else
            throw new IOException("First argument in output must be a tuple");
{noformat}

Personally, I don't like this input validation style.
Much better to validate input in a flat way at the beginning:

{noformat}
if (t.getType(0) != DataType.TUPLE)
    throw ....
if (t.getType(1) != DataType.TUPLE)
    throw ....

// now we know input is ok, so we can focus on doing real stuff
{noformat}

Moreover, {{cqlQueryFromTuple}} does the same validation again...

------------

org.apache.cassandra.hadoop.pig.CqlNativeStorage#setStoreLocation:

This method is a copy-paste from
org.apache.cassandra.hadoop.pig.CqlStorage#setStoreLocation
with only a minor section related to bulkOutputFormat added.

Any reason for not using super.setStoreLocation()?

------------

org/apache/cassandra/io/sstable/CQLSSTableWriter.java:
{noformat}
               try
               {
                   Schema.instance.load(ksm);
               }
               catch (Exception e)
               {
                   //It may get an exception of Attempting to load already loaded column family
              }
{noformat}
Ok, I get it, but what if it tries to load it for the first time and fails? It doesn't even
inform the user that something bad happened and why. 
Also, can you elaborate more on why it may want to load it multiple times?


> Pig support for BulkOutputFormat as a parameter in url
> ------------------------------------------------------
>
>                 Key: CASSANDRA-7410
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7410
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>            Priority: Minor
>             Fix For: 2.0.15
>
>         Attachments: 7410-2.0-branch.txt, 7410-2.1-branch.txt, 7410-v2-2.0-branch.txt,
7410-v3-2.0-branch.txt, CASSANDRA-7410-v2-2.1-branch.txt
>
>
> Add BulkOutputFormat support in Pig url



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message