cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8609) Remove depency of hadoop to internals (Cell/CellName)
Date Mon, 01 Jun 2015 11:37:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567188#comment-14567188
] 

Sam Tunnicliffe commented on CASSANDRA-8609:
--------------------------------------------

I don't think we can completely remove the dependency on internal classes in this way as it
would remove the ability to write M/R jobs which use timestamp and ttl. While it doesn't break
any of the bundled pig or hadoop examples, it's feasible for jobs out in the wild to be doing
this. 

I think the right thing to do is to create a new simple class in the {{org.apache.cassandra.hadoop}}
package to represent a column (much like the old {{org.apache.cassandra.db.Column}} from 2.0)
and use that throughout the thrift side of the hadoop integration. The {{ColumnFamilyRecordReader#unthriftifyX}}
methods should then be translating from the thrift classes into these new simple columns.

Also, the utility of {{AbstractCassandraStorage}} isn't clear to me. {{CassandraStorage}}
doesn't extend it and I can't find any reference to it in the project at all (i.e. it isn't
being tested/exercised by any of the demos as far as I can tell). Is there any reason why
users writing their own {{LoadStoreFunc}} would choose to extend {{ACS}} rather than {{CS}}.
At the very least, shouldn't it be marked deprecated like {{CS}}?

> Remove depency of hadoop to internals (Cell/CellName)
> -----------------------------------------------------
>
>                 Key: CASSANDRA-8609
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8609
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Philip Thompson
>             Fix For: 2.2.0 rc1
>
>         Attachments: 8609-2.2-2.txt, 8609-2.2.txt, CASSANDRA-8609-3.0-branch.txt
>
>
> For some reason most of the Hadoop code (ColumnFamilyRecordReader, CqlStorage, ...) uses
the {{Cell}} and {{CellName}} classes. That dependency is entirely artificial: all this code
is really client code that communicate with Cassandra over thrift/native protocol and there
is thus no reason for it to use internal classes. And in fact, thoses classes are used in
a very crude way, as a {{Pair<ByteBuffer, ByteBuffer>}} really.
> But this dependency is really painful when we make changes to the internals. Further,
every time we do so, I believe we break some of those the APIs due to the change. This has
been painful for CASSANDRA-5417 and this is now painful for CASSANDRA-8099. But while I somewhat
hack over it in CASSANDRA-5417, this was a mistake and we should have removed the depency
back then. So let do that now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message