Thanks for your thoughts.
I've done some simple benchmarks with the batch insert apis and was looking for something slightly more performant. Is there a batch row insert that I missed?
Any pointers (at all) to anything related to FB's bulk loading or the binarymemtable? I've attempted to do this by writing a custom IVerbHandler for ingestion and interfacing with the MessagingService internally but it's not that clean.
From: Jonathan Ellis <email@example.com>
Sent: Thursday, May 21, 2009 7:44:59 AM
Subject: Re: Ingesting from Hadoop to Cassandra
Have you benchmarked the batch insert apis? If that is "fast enough"
then it's by far the simplest way to go.
Otherwise you'll have to use the binarymemtable stuff which is
undocumented and not exposed as a client api (you basically write a
custom "loader" version of cassandra to use it, I think). FB used
this for their own bulk loading so it works at some level, but clearly
there is some assembly required.
On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares <firstname.lastname@example.org
> Hi all,
> I'm trying to find the most optimal way to ingest my content from Hadoop to
> Cassandra. Assuming I have figured out the table representation for this
> content, what is the best way to do go about pushing from my cluster? What
> Cassandra client batch APIs do you suggest I use
to push to Cassandra? I'm
> sure this is a common pattern, I'm curious to see how it has been
> implemented. Assume millions of of rows and 1000s of columns.
> Thanks in advance,