From cassandra-user-return-62-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Thu May 21 16:27:50 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 81692 invoked from network); 21 May 2009 16:27:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 May 2009 16:27:50 -0000 Received: (qmail 58195 invoked by uid 500); 21 May 2009 16:28:03 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 58164 invoked by uid 500); 21 May 2009 16:28:03 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 58155 invoked by uid 99); 21 May 2009 16:28:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 May 2009 16:28:03 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.191.125.176] (HELO n11a.bullet.mail.mud.yahoo.com) (209.191.125.176) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 21 May 2009 16:27:47 +0000 Received: from [68.142.200.225] by n11.bullet.mail.mud.yahoo.com with NNFMP; 21 May 2009 16:27:25 -0000 Received: from [76.13.13.26] by t6.bullet.mud.yahoo.com with NNFMP; 21 May 2009 16:27:25 -0000 Received: from [76.13.10.172] by t3.bullet.mail.ac4.yahoo.com with NNFMP; 21 May 2009 16:27:25 -0000 Received: from [127.0.0.1] by omp113.mail.ac4.yahoo.com with NNFMP; 21 May 2009 16:27:25 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 705005.59823.bm@omp113.mail.ac4.yahoo.com Received: (qmail 85812 invoked by uid 60001); 21 May 2009 16:27:25 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ymail.com; s=s1024; t=1242923245; bh=ujE5Ano9oSSn1OtAx0iYJMeWrZE/lrMgcMrLGdHV8Fg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=C/wYiQ5u1Y23WHeznjoSfJMD70KbsQiqkGXM4+X0exEe7bmtClFmJs0h6r1Y9MgruTgtMuEZqPKczT+CB+TWUPnEbaXG4HLuv1xNQSpvtjo26zptrVF91CLLs2WRP1bPbvDmlrU1RzZAeelmZNjtCKldm2En+PLNmqmzx9KLx7s= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=ymail.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=SgnANmanLRwIFHOEBOK9uGiv8mu4zGdq3RIwIR2nq+JOemvcQGyt/NyFVyrDenuJnQtnUMkNjGxmwhCCukfGdDWI5JdSl4NbbF+a14VmVCIYM0hw7AbkfX7Kr2vm6Jwx5j/oB3wr90Mhk3zOMdPPDIJQZ4GVFJ5sNM5uYjfRq0g=; Message-ID: <462232.81242.qm@web59616.mail.ac4.yahoo.com> X-YMail-OSG: 2cdAhVAVM1mCU4DOFBOiNmlkgBmZLJteAGQiWvHu.LIbxKz1LYHeZmmgDKpsalarc1_y8ZiBHgcLWlKE_MTGJJzKLDvClusupqGX9R8t563f5xLOLn0SA4HT3ouIeB.wjns_CRW0yKeYxpAAJahbP1ue6IaLcysZC3.7gZcV9AUvX.5v5cBUvjuQcs73hU2WQZ3UEU8u3yVMm_w8r91oI3LoUsNTIKvkG561rhoRZqHmHqLukFLynKrtDOgYP6nyW9Syzor6FG7sTEMtCaa4SviBK0GuPmrmaNPTGruRp9n9LVxiPQ-- Received: from [216.145.54.7] by web59616.mail.ac4.yahoo.com via HTTP; Thu, 21 May 2009 09:27:25 PDT X-Mailer: YahooMailRC/1277.43 YahooMailWebService/0.7.289.10 References: <732638.58687.qm@web59610.mail.ac4.yahoo.com> Date: Thu, 21 May 2009 09:27:25 -0700 (PDT) From: Alexandre Linares Subject: Re: Ingesting from Hadoop to Cassandra To: cassandra-user@incubator.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1018660957-1242923245=:81242" X-Virus-Checked: Checked by ClamAV on apache.org --0-1018660957-1242923245=:81242 Content-Type: text/plain; charset=us-ascii Jonathan, Thanks for your thoughts. I've done some simple benchmarks with the batch insert apis and was looking for something slightly more performant. Is there a batch row insert that I missed? Any pointers (at all) to anything related to FB's bulk loading or the binarymemtable? I've attempted to do this by writing a custom IVerbHandler for ingestion and interfacing with the MessagingService internally but it's not that clean. Thanks again, -Alex ________________________________ From: Jonathan Ellis To: cassandra-user@incubator.apache.org Sent: Thursday, May 21, 2009 7:44:59 AM Subject: Re: Ingesting from Hadoop to Cassandra Have you benchmarked the batch insert apis? If that is "fast enough" then it's by far the simplest way to go. Otherwise you'll have to use the binarymemtable stuff which is undocumented and not exposed as a client api (you basically write a custom "loader" version of cassandra to use it, I think). FB used this for their own bulk loading so it works at some level, but clearly there is some assembly required. -Jonathan On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares wrote: > Hi all, > > I'm trying to find the most optimal way to ingest my content from Hadoop to > Cassandra. Assuming I have figured out the table representation for this > content, what is the best way to do go about pushing from my cluster? What > Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm > sure this is a common pattern, I'm curious to see how it has been > implemented. Assume millions of of rows and 1000s of columns. > > Thanks in advance, > -Alex > > --0-1018660957-1242923245=:81242 Content-Type: text/html; charset=us-ascii
Jonathan,

Thanks for your thoughts.

I've done some simple benchmarks with the batch insert apis and was looking for something slightly more performant.  Is there a batch row insert that I missed?

Any pointers (at all) to anything related to FB's bulk loading or the binarymemtable?  I've attempted to do this by writing a custom IVerbHandler for ingestion and interfacing with the MessagingService internally but it's not that clean.

Thanks again,
-Alex


From: Jonathan Ellis <jbellis@gmail.com>
To: cassandra-user@incubator.apache.org
Sent: Thursday, May 21, 2009 7:44:59 AM
Subject: Re: Ingesting from Hadoop to Cassandra

Have you benchmarked the batch insert apis?  If that is "fast enough"
then it's by far the simplest way to go.

Otherwise you'll have to use the binarymemtable stuff which is
undocumented and not exposed as a client api (you basically write a
custom "loader" version of cassandra to use it, I think).  FB used
this for their own bulk loading so it works at some level, but clearly
there is some assembly required.

-Jonathan

On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares <linares@ymail.com> wrote:
> Hi all,
>
> I'm trying to find the most optimal way to ingest my content from Hadoop to
> Cassandra.  Assuming I have figured out the table representation for this
> content, what is the best way to do go about pushing from my cluster?  What
> Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm
> sure this is a common pattern, I'm curious to see how it has been
> implemented.  Assume millions of of rows and 1000s of columns.
>
> Thanks in advance,
> -Alex
>
>

--0-1018660957-1242923245=:81242--