cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1368) Add output support for Hadoop Streaming
Date Wed, 25 Aug 2010 01:54:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902273#action_12902273
] 

Jonathan Ellis commented on CASSANDRA-1368:
-------------------------------------------

bq. Thrift and Avro serialization exist because JSON is not a nice way to deal with tons of
data (especially binary data). 

We've introduced support for annotating data with types, so you can represent a long as a
long and a uuid as a pretty string, instead of everything being opaque binary.

I worry that the Avro cure is worse than the disease.

bq. I think you are seriously underestimating the can of worms this would be, and it wouldn't
even get you timestamp support.

Maybe. But ColumnOrSuperColumn isn't a whole lot better, and has the drawback of inflicting
Yet Another Serialization Format on people to learn.

> Add output support for Hadoop Streaming
> ---------------------------------------
>
>                 Key: CASSANDRA-1368
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1368
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Stu Hood
>             Fix For: 0.7 beta 2
>
>         Attachments: 0001-Switch-to-Cloudera-s-Distribution-of-Hadoop.patch, 0002-Add-an-Avro-OutputReader-and-Resolver-for-Hadoop-Str.patch,
0003-Apply-the-deprecated-OutputFormat-interface-to-allow.patch, 0004-Add-Streaming-example-shell-scripts.patch
>
>
> Hadoop Streaming is a framework that allows mapreduce jobs to be written in languages
other than Java, by performing simple IPC on stdin/stdout.
> Adding output support for Hadoop Streaming to Cassandra would mean that users could write
very simple scripts in dynamic languages to load data into Cassandra. Once our Hadoop OutputFormat
has stabilized a bit, we might also be able to this code to provide scalable bulk loading.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message