cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
Date Wed, 15 Apr 2015 13:55:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496197#comment-14496197
] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
-----------------------------------------------

CqlTableTest L336 and L368
{noformat}
count = 0;
while (it.hasNext())
{
    it.next();
    count ++;
}
{noformat}

Use Guava Iterators.size(it).

-------------------

Code style issues:
getToken, retrieveKeys: unused exceptions reported
getToken: too big and too nested for my taste

-------------------

retrieveKeys L492:
{noformat}
CqlRow cqlRow = result.rows.get(0);
{noformat}

Will fail in a very cryptic way if the keyspace / table doesn't exist.
It is good to give the user hints what went wrong.

-------------------

retrievKeys L503:
{noformat}
           for (CfDef cfDef : ksDef.cf_defs)
            {
                if (cfDef.name.equalsIgnoreCase(cfName))   
                {
                    CFMetaData cfMeta = ThriftConversion.fromThrift(cfDef);
{noformat}
Why equalsIgnoreCase?

------------------

retrieveKeys L512:
{noformat}
        return Pair.create(parseType(ByteBufferUtil.string(ByteBuffer.wrap(cqlRow.columns.get(1).getValue()))),
keys);
{noformat}
Code style: Expression too complex, too many nesting levels, hard to read.


------------------
getToken L410:
{noformat}
        int i = 0;
{noformat}
This should be declared in the first branch of the following if, because it is used only there,
in order not to pollute the wider scope.

------------------
{noformat}
    catch (Exception e)
        {
           //not a Terminal term
        }
{noformat}

Are you sure you really want to swallow all the exceptions here? Or did you have some specific
exception in mind like {{InvalidRequestException}}?
Swallowing exceptions by a very general catch-all clause is very dangerous.

------------------
getToken L456-L462:
{noformat}
            for (String key : validators.keySet())
                keyValues[i++] = eqColumns.get(key);
            IPartitioner partitioner = ConfigHelper.getInputPartitioner(conf);
            if (keyValidator instanceof CompositeType)
                return partitioner.getToken(((CompositeType) keyValidator).build(keyValues));
            else
                return partitioner.getToken(eqColumns.get(keys.get(0)));

{noformat}

validators is a HashMap and HashMaps do not preserve key order. The order of items in the
keyValues array here may not match the order of the key columns in the keyValidator, therefore
the values may be misplaced. If all key components are of the same type, this may fail in
a very subtle / silent way.

Besides that: Cassandra style of writing this would be to use a ternary operator:
{noformat}
return (keyValidator instanceof CompositeType) 
  ?  ...
  : ...
{noformat}

-----------------
getSplits L140-L147
{noformat}
  	try
        {
            token = getToken(conf);
        }
        catch (Exception e)
        {
            throw new IOException(e);
        }
{noformat}
Given that this change is going to be included in a patch version of Cassandra, we should
not increase the likelihood of failure here by throwing some additional exceptions, that previously
could never happen. If getting a token fails, we should log the failure with the exception
at ERROR level and continue without the token, because all this token thing is only an optimization.


-----------------

ColumnFamilySplit L74:
getPartitionKeyQuery should be called isPartitionKeyQuery

-----------------

SplitCallable#call L293:
{noformat}
                    if (containToken)
                        split.setPartitionKeyEqQuery(containToken);
{noformat}
Can be simplified to:
{noformat}
  split.setPartitionKeyEqQuery(true);
{noformat}

containToken is always true at the point of reaching the if statement.
Therefore you really don't need the containToken variable at all, and you can remove some
earlier code related to setting it as well.

==================================
Overall I vote against putting this into 2.1.5, because it is a too big feature which may
have effects on correctness and performance.

> Primary Key Pushdown For Hadoop
> -------------------------------
>
>                 Key: CASSANDRA-8576
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Russell Alexander Spitzer
>            Assignee: Alex Liu
>             Fix For: 2.1.5
>
>         Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate pushdown
functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format can't pushdown
primary key predicate
> Current implementation requires a full table scan (since it can't recognize that a single
partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message