Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 68568 invoked from network); 13 Jul 2010 17:33:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Jul 2010 17:33:12 -0000 Received: (qmail 96660 invoked by uid 500); 13 Jul 2010 17:33:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96562 invoked by uid 500); 13 Jul 2010 17:33:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96552 invoked by uid 99); 13 Jul 2010 17:33:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jul 2010 17:33:09 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mubarak.seyed@gmail.com designates 209.85.161.172 as permitted sender) Received: from [209.85.161.172] (HELO mail-gx0-f172.google.com) (209.85.161.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jul 2010 17:33:02 +0000 Received: by gxk3 with SMTP id 3so3753228gxk.31 for ; Tue, 13 Jul 2010 10:31:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=0GbRS8d2uACZLuioT/hnBnMjHCt6FsCnEJVktE394BA=; b=OY4g4bFaSeSwLpf/t+u3ObIsnOgYK9XkDy/oAcnumI6EWua+wZiPws81lMHrAxP/Tw +fWR0f1MoyjqjE5pL4LzO1hLKSYVN5QZZlBX4h06k/TcYDIs2cWDaXVeD+muT89rCNmZ YPFW/uKTSZfoB9XJS23HXEeJbVuFlgQWEjhQk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=fyuCz6MvSmnAOMlnl3Oy+eGhYR+PVO4LSNKr2J2Lz2PoQIMeYVpCw0FiQ0FksOzlMP OvRupPKLhBPfwIgwngPmucvngc6OaEEJJvvFlPlbGOuFECOzeKZsDxKfB3K7dT/HmWdr aGHk4NfvRZdDZLDp3+FRxJPu4K0TvE+gn474Q= MIME-Version: 1.0 Received: by 10.150.238.15 with SMTP id l15mr6970749ybh.271.1279042312925; Tue, 13 Jul 2010 10:31:52 -0700 (PDT) Received: by 10.150.133.3 with HTTP; Tue, 13 Jul 2010 10:31:52 -0700 (PDT) In-Reply-To: References: Date: Tue, 13 Jul 2010 10:31:52 -0700 Message-ID: Subject: Re: CassandraBulkLoader From: Mubarak Seyed To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd24ac0b6f3fb048b483e29 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd24ac0b6f3fb048b483e29 Content-Type: text/plain; charset=ISO-8859-1 Thanks Torsten. Jonathan's blog on Fact Vs Fiction says that Fact: It has always been straightforward to send the output of Hadoop jobs to Cassandra, and Facebook, Digg, and others have been using Hadoop like this as a Cassandra bulk-loader for over a year. Does anyone from Facebook or Digg share details on how to use Cassandra BulkLoader? I could see some details from Arin's presentation on Cassandra @ Digg about data load from MySQL -> Hadoop -> Cassandra. Can someone please help me? Thanks, Mubarak On Tue, Jul 13, 2010 at 1:27 AM, Torsten Curdt wrote: > On Tue, Jul 13, 2010 at 04:35, Mubarak Seyed > wrote: > > Where can i find the documentation for BinaryMemTable (btm_example in > contrib) > > to use CassandraBulkLoader? What is the input to be supplied to > CassandraBulkLoader? > > How to form the input data and what is the format of an input data? > > The code is the documentation I fear. > > I'll see if I get permission to get our updated code contributed. > We added command line fu and using it to import large TSVs. > > > Do i need the HDFS to store my storage-conf.xml? > > Why HDFS? > > The machine running the bulk loader joins the cassandra ring kind of > like a temporary node. > So you will need the storage-conf.xml on that machine. > > cheers > -- > Torsten > -- Thanks, Mubarak Seyed. --000e0cd24ac0b6f3fb048b483e29 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Torsten.

Jonathan's blog on Fact Vs Fiction s= ays that=A0

Fact: It has always been straightforward to send the output of Hadoop jobs = to Cassandra, and Facebook, Digg, and others have been using Hadoop like th= is as a Cassandra bulk-loader for over a year.

Does anyone from Facebook or Digg share det= ails on how to use Cassandra BulkLoader?=A0

I could see some details from Arin's pr= esentation on Cassandra @ Digg about
data load from MySQL -> Hadoop -> Cassandra.
<= div>
Can someone please help me?
=
Thanks,
Mubarak

On Tue, Jul 13, 2010 at 1:27 A= M, Torsten Curdt <= tcurdt@vafer.org> wrote:
On Tue, Jul 13, 2010 at 04:35, Mubarak Seyed <mubarak.seyed@gmail.com> wrote:<= br> > Where can i find the documentation for BinaryMemTable (btm_example in = contrib)
> to use CassandraBulkLoader? What is the input to be supplied to = CassandraBulkLoader?
> How to form the input data and what is the format of= an input data?

The code is the documentation I fear.

I'll see if I get permission to get our updated code contributed.
We added command line fu and using it to import large TSVs.

> Do i need the HDFS to store my storage-conf.xml?

Why HDFS?

The machine running the bulk loader joins the cassandra ring kind of
like a temporary node.
So you will need the storage-conf.xml on that machine.

cheers
--
Torsten



--
Thanks,
Mubar= ak Seyed.
--000e0cd24ac0b6f3fb048b483e29--