cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Resolved) (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-3928) Bulk loading to cassandra with Python Hadoop Job.
Date Fri, 17 Feb 2012 14:02:01 GMT


Brandon Williams resolved CASSANDRA-3928.

    Resolution: Won't Fix
      Reviewer:   (was: jbellis)

The only way to do this short of reimplementing everything in python would be to use jython
to write the sstables via BOF and stream them in.  Alternatively, you could insert the data
via thrift from cpython.
> Bulk loading to cassandra with Python Hadoop Job.
> -------------------------------------------------
>                 Key: CASSANDRA-3928
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop, Tools
>    Affects Versions: 1.2
>            Reporter: Samarth Gahire
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: bulkloader, hadoop, python, sstableloader
>             Fix For: 1.2
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> I was wondering if we can have a OutPutFormat to Bulkload the data to Cassandra with
Hadoop Job Written in Python.
> I am having very complex Hadoop job written in Python which processes test data and generate
structured data in sequential file. I read this data and stream it to cassandra using BulkOutPutFormat.
> Is there any way that I can avoid writing to sequential file and directly process and
stream data to Cassandra(With Hadoop Job written in python)?
> What could be a possible solution for same?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message