Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE71E17563 for ; Wed, 15 Apr 2015 13:55:01 +0000 (UTC) Received: (qmail 42333 invoked by uid 500); 15 Apr 2015 13:55:01 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 42302 invoked by uid 500); 15 Apr 2015 13:55:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 42291 invoked by uid 99); 15 Apr 2015 13:55:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2015 13:55:01 +0000 Date: Wed, 15 Apr 2015 13:55:01 +0000 (UTC) From: =?utf-8?Q?Piotr_Ko=C5=82aczkowski_=28JIRA=29?= To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14496197#comment-14496197 ]=20 Piotr Ko=C5=82aczkowski commented on CASSANDRA-8576: ----------------------------------------------- CqlTableTest L336 and L368 {noformat} count =3D 0; while (it.hasNext()) { it.next(); count ++; } {noformat} Use Guava Iterators.size(it). ------------------- Code style issues: getToken, retrieveKeys: unused exceptions reported getToken: too big and too nested for my taste ------------------- retrieveKeys L492: {noformat} CqlRow cqlRow =3D result.rows.get(0); {noformat} Will fail in a very cryptic way if the keyspace / table doesn't exist. It is good to give the user hints what went wrong. ------------------- retrievKeys L503: {noformat} for (CfDef cfDef : ksDef.cf_defs) { if (cfDef.name.equalsIgnoreCase(cfName)) =20 { CFMetaData cfMeta =3D ThriftConversion.fromThrift(cfDef= ); {noformat} Why equalsIgnoreCase? ------------------ retrieveKeys L512: {noformat} return Pair.create(parseType(ByteBufferUtil.string(ByteBuffer.wrap(= cqlRow.columns.get(1).getValue()))), keys); {noformat} Code style: Expression too complex, too many nesting levels, hard to read. ------------------ getToken L410: {noformat} int i =3D 0; {noformat} This should be declared in the first branch of the following if, because it= is used only there, in order not to pollute the wider scope. ------------------ {noformat} catch (Exception e) { //not a Terminal term } {noformat} Are you sure you really want to swallow all the exceptions here? Or did you= have some specific exception in mind like {{InvalidRequestException}}? Swallowing exceptions by a very general catch-all clause is very dangerous. ------------------ getToken L456-L462: {noformat} for (String key : validators.keySet()) keyValues[i++] =3D eqColumns.get(key); IPartitioner partitioner =3D ConfigHelper.getInputPartitioner(c= onf); if (keyValidator instanceof CompositeType) return partitioner.getToken(((CompositeType) keyValidator).= build(keyValues)); else return partitioner.getToken(eqColumns.get(keys.get(0))); {noformat} validators is a HashMap and HashMaps do not preserve key order. The order o= f items in the keyValues array here may not match the order of the key colu= mns in the keyValidator, therefore the values may be misplaced. If all key = components are of the same type, this may fail in a very subtle / silent wa= y. Besides that: Cassandra style of writing this would be to use a ternary ope= rator: {noformat} return (keyValidator instanceof CompositeType)=20 ? ... : ... {noformat} ----------------- getSplits L140-L147 {noformat} =09try { token =3D getToken(conf); } catch (Exception e) { throw new IOException(e); } {noformat} Given that this change is going to be included in a patch version of Cassan= dra, we should not increase the likelihood of failure here by throwing some= additional exceptions, that previously could never happen. If getting a to= ken fails, we should log the failure with the exception at ERROR level and = continue without the token, because all this token thing is only an optimiz= ation.=20 ----------------- ColumnFamilySplit L74: getPartitionKeyQuery should be called isPartitionKeyQuery ----------------- SplitCallable#call L293: {noformat} if (containToken) split.setPartitionKeyEqQuery(containToken); {noformat} Can be simplified to: {noformat} split.setPartitionKeyEqQuery(true); {noformat} containToken is always true at the point of reaching the if statement. Therefore you really don't need the containToken variable at all, and you c= an remove some earlier code related to setting it as well. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D Overall I vote against putting this into 2.1.5, because it is a too big fea= ture which may have effects on correctness and performance. > Primary Key Pushdown For Hadoop > ------------------------------- > > Key: CASSANDRA-8576 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 > Project: Cassandra > Issue Type: Improvement > Components: Hadoop > Reporter: Russell Alexander Spitzer > Assignee: Alex Liu > Fix For: 2.1.5 > > Attachments: 8576-2.1-branch.txt, 8576-trunk.txt > > > I've heard reports from several users that they would like to have predic= ate pushdown functionality for hadoop (Hive in particular) based services.= =20 > Example usecase > Table with wide partitions, one per customer > Application team has HQL they would like to run on a single customer > Currently time to complete scales with number of customers since Input Fo= rmat can't pushdown primary key predicate > Current implementation requires a full table scan (since it can't recogni= ze that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)