cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API
Date Fri, 27 Mar 2015 21:30:53 GMT


Aleksey Yeschenko commented on CASSANDRA-8358:

Pushed a squashed version based on latest trunk to
with no changes.

So far things mostly look good. I'd like to do a few more cosmetic things, however.

1. {{AbstractColumnFamilyRecordWriter}} is a small class, and both the (deprecated) {{ColumnFamilyRecordWriter}}
and {{CqlRecordWriter}} extend it. It also has Thrift-specific logic. So I would prefer the
abstract class to go away entirely, with its functionality duplicated, if needed, (the shared
bits) in {{ColumnFamilyRecordWriter}} and {{CqlRecordWriter}}
2. Same for {{AbstractColumnFamilyOutputFormat}}
3. Same for {{AbstractColumnFamilyInputFormat}}. At the very least it shouldn't include Thrift-only
functionality ({{createAuthenticatedClient}}), at most I'd like to get rid of the abstract
class and have {{ColumnFamilyInputFormat}} and {{CqlInputFormat}} duplicate the shared bits.
4. Same for {{AbstractBulkRecordWriter}} - more than half the class is Thrift-code. Plus,
shouldn't old {{BulkRecordWriter}} be {{@Deprecated}} too?
5. Same for {{AbstractBulkOutputFormat}} and deprecation of {{BulkOutputFormat}} itself (right
now both its methods are deprecated individually)

With all the  {{*ColumnFamily*}} versions getting deprecated in this version, removing them
in 3.later would be as simple as rm-ing the non-CQL classes.

Would also be nice to get rid of "column family" naming everywhere in Cql* classed, in favor
of Table* - in method names and class names.

> Bundled tools shouldn't be using Thrift API
> -------------------------------------------
>                 Key: CASSANDRA-8358
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Philip Thompson
>             Fix For: 3.0
> In 2.1, we switched cqlsh to the python-driver.
> In 3.0, we got rid of cassandra-cli.
> Yet there is still code that's using legacy Thrift API. We want to convert it all to
use the java-driver instead.
> 1. BulkLoader uses Thrift to query the schema tables. It should be using java-driver
metadata APIs directly instead.
> 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
> 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
> 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
> 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
> Some of the things listed above use Thrift to get the list of partition key columns or
clustering columns. Those should be converted to use the Metadata API of the java-driver.
> Somewhat related to that, we also have badly ported code from Thrift in o.a.c.hadoop.cql3.CqlRecordReader
(see fetchKeys()) that manually fetches columns from schema tables instead of properly using
the driver's Metadata API.
> We need all of it fixed. One exception, for now, is o.a.c.hadoop.AbstractColumnFamilyInputFormat
- it's using Thrift for its describe_splits_ex() call that cannot be currently replaced by
any java-driver call (?).
> Once this is done, we can stop starting Thrift RPC port by default in cassandra.yaml.

This message was sent by Atlassian JIRA

View raw message