cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Linares <lina...@ymail.com>
Subject Re: Ingesting from Hadoop to Cassandra
Date Fri, 22 May 2009 01:20:14 GMT
Thanks, I'll see what I can do with this thought.

-Alex




________________________________
From: Jonathan Ellis <jbellis@gmail.com>
To: cassandra-user@incubator.apache.org
Sent: Thursday, May 21, 2009 9:42:29 AM
Subject: Re: Ingesting from Hadoop to Cassandra

No, batch APIs are per CF, not per row.

Several people have asked Avinash for sample code using BinaryMemtable
but to my knowledge nothing ever came of that.

The high level description of the BMT is that you give it serialized
CFs as values instead of raw columns so it can just sort on key and
write directly to disk.  So then you would do something like this:

Table table = Table.open(mytablename);
ColumnFamilyStore store = table.getColumnFamilyStore(mycfname);
for cf : mydata
  store.applyBinary(cf.key, toByteArray(cf))

There's no provision for doing this over the network that I know of,
you have to put the right keys on the right nodes manually.

-Jonathan

On Thu, May 21, 2009 at 11:27 AM, Alexandre Linares <linares@ymail.com> wrote:
> Jonathan,
>
> Thanks for your thoughts.
>
> I've done some simple benchmarks with the batch insert apis and was looking
> for something slightly more performant.  Is there a batch row insert that I
> missed?
>
> Any pointers (at all) to anything related to FB's bulk loading or the
> binarymemtable?  I've attempted to do this by writing a custom IVerbHandler
> for ingestion and interfacing with the MessagingService internally but it's
> not that clean.
>
> Thanks again,
> -Alex
>
> ________________________________
> From: Jonathan Ellis <jbellis@gmail.com>
> To: cassandra-user@incubator.apache.org
> Sent: Thursday, May 21, 2009 7:44:59 AM
> Subject: Re: Ingesting from Hadoop to Cassandra
>
> Have you benchmarked the batch insert apis?  If that is "fast enough"
> then it's by far the simplest way to go.
>
> Otherwise you'll have to use the binarymemtable stuff which is
> undocumented and not exposed as a client api (you basically write a
> custom "loader" version of cassandra to use it, I think).  FB used
> this for their own bulk loading so it works at some level, but clearly
> there is some assembly required.
>
> -Jonathan
>
> On Thu, May 21, 2009 at 2:28 AM, Alexandre Linares <linares@ymail.com>
> wrote:
>> Hi all,
>>
>> I'm trying to find the most optimal way to ingest my content from Hadoop
>> to
>> Cassandra.  Assuming I have figured out the table representation for this
>> content, what is the best way to do go about pushing from my cluster?
>> What
>> Cassandra client batch APIs do you suggest I use to push to Cassandra? I'm
>> sure this is a common pattern, I'm curious to see how it has been
>> implemented.  Assume millions of of rows and 1000s of columns.
>>
>> Thanks in advance,
>> -Alex
>>
>>
>
>



      
Mime
View raw message