cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: CassandraBulkLoader
Date Tue, 13 Jul 2010 20:06:38 GMT
look at contrib/bmt_example, with the caveat that it's usually
premature optimization

On Tue, Jul 13, 2010 at 12:31 PM, Mubarak Seyed <mubarak.seyed@gmail.com> wrote:
> Thanks Torsten.
> Jonathan's blog on Fact Vs Fiction says that
> Fact: It has always been straightforward to send the output of Hadoop jobs
> to Cassandra, and Facebook, Digg, and others have been using Hadoop like
> this as a Cassandra bulk-loader for over a year.
> Does anyone from Facebook or Digg share details on how to use Cassandra
> BulkLoader?
> I could see some details from Arin's presentation on Cassandra @ Digg about
> data load from MySQL -> Hadoop -> Cassandra.
> Can someone please help me?
> Thanks,
> Mubarak
>
> On Tue, Jul 13, 2010 at 1:27 AM, Torsten Curdt <tcurdt@vafer.org> wrote:
>>
>> On Tue, Jul 13, 2010 at 04:35, Mubarak Seyed <mubarak.seyed@gmail.com>
>> wrote:
>> > Where can i find the documentation for BinaryMemTable (btm_example in
>> > contrib)
>> > to use CassandraBulkLoader? What is the input to be supplied to
>> > CassandraBulkLoader?
>> > How to form the input data and what is the format of an input data?
>>
>> The code is the documentation I fear.
>>
>> I'll see if I get permission to get our updated code contributed.
>> We added command line fu and using it to import large TSVs.
>>
>> > Do i need the HDFS to store my storage-conf.xml?
>>
>> Why HDFS?
>>
>> The machine running the bulk loader joins the cassandra ring kind of
>> like a temporary node.
>> So you will need the storage-conf.xml on that machine.
>>
>> cheers
>> --
>> Torsten
>
>
>
> --
> Thanks,
> Mubarak Seyed.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message