cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
Date Sat, 09 May 2015 10:23:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536341#comment-14536341
] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
-----------------------------------------------

Some comments were not addressed.
{noformat}
              boolean containToken;
                for (Range<Token> subrange : ranges)
                {
                    //make sure subrange contains the token
                    containToken = false;
                    if (token != null)
                    {
                        if (subrange.contains(token))
                            containToken = true;
                        else
                            continue;
                    }

                    ColumnFamilySplit split =
                            new ColumnFamilySplit(
                                    factory.toString(subrange.left),
                                    factory.toString(subrange.right),
                                    subSplit.getRow_count(),
                                    endpoints);

                    if (containToken)
                        split.setPartitionKeyEqQuery(containToken);
                    logger.debug("adding {}", split);
{noformat}
Multiple code smells in this fragment:
* boolean flag declared in a needlessly broad scope. If something is used only inside a loop,
it should be declared only inside the loop.
* continue controlled by a boolean flag
* redundant if (the code is equivalent without if (containToken)

I simplified it for you:
{noformat}
                for (Range<Token> subrange : ranges)
                {
                    boolean containsToken = token != null && subrange.contains(token);
                    if (token == null || containsToken) {
                        ColumnFamilySplit split =
                            new ColumnFamilySplit(
                                factory.toString(subrange.left),
                                factory.toString(subrange.right),
                                subSplit.getRow_count(),
                                endpoints);
                        split.setPartitionKeyEqQuery(containsToken);
                        logger.debug("adding {}", split);
                        splits.add(split);
                    }
                }
{noformat}





> Primary Key Pushdown For Hadoop
> -------------------------------
>
>                 Key: CASSANDRA-8576
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Russell Alexander Spitzer
>            Assignee: Alex Liu
>             Fix For: 2.1.x
>
>         Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt
>
>
> I've heard reports from several users that they would like to have predicate pushdown
functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format can't pushdown
primary key predicate
> Current implementation requires a full table scan (since it can't recognize that a single
partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message