incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mubarak Seyed <mubarak.se...@gmail.com>
Subject Re: CassandraBulkLoader
Date Tue, 13 Jul 2010 17:31:52 GMT
Thanks Torsten.

Jonathan's blog on Fact Vs Fiction says that

Fact: It has always been straightforward to send the output of Hadoop jobs
to Cassandra, and Facebook, Digg, and others have been using Hadoop like
this as a Cassandra bulk-loader for over a year.

Does anyone from Facebook or Digg share details on how to use Cassandra
BulkLoader?

I could see some details from Arin's presentation on Cassandra @ Digg about
data load from MySQL -> Hadoop -> Cassandra.

Can someone please help me?

Thanks,
Mubarak

On Tue, Jul 13, 2010 at 1:27 AM, Torsten Curdt <tcurdt@vafer.org> wrote:

> On Tue, Jul 13, 2010 at 04:35, Mubarak Seyed <mubarak.seyed@gmail.com>
> wrote:
> > Where can i find the documentation for BinaryMemTable (btm_example in
> contrib)
> > to use CassandraBulkLoader? What is the input to be supplied to
> CassandraBulkLoader?
> > How to form the input data and what is the format of an input data?
>
> The code is the documentation I fear.
>
> I'll see if I get permission to get our updated code contributed.
> We added command line fu and using it to import large TSVs.
>
> > Do i need the HDFS to store my storage-conf.xml?
>
> Why HDFS?
>
> The machine running the bulk loader joins the cassandra ring kind of
> like a temporary node.
> So you will need the storage-conf.xml on that machine.
>
> cheers
> --
> Torsten
>



-- 
Thanks,
Mubarak Seyed.

Mime
View raw message