incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@almaden.ibm.com>
Subject Re: Cassandra + Hadoop + BMT
Date Tue, 08 Sep 2009 22:24:51 GMT

I was trying to understand how the MapReduce job figures out where a row is
located in a cassandra cluster and I saw the following code. Does this
really work? To compute the proper endpoints, the StorageService needs to
be started to obtain all tokens from other nodes through gossip. However,
StorageService is not started in the MapReduce job.

    for (EndPoint endpoint : StorageService.instance
().getReadStorageEndPoints(rowKey)) {
      /* Send message to end point */
      MessagingService.getMessagingInstance().sendOneWay(message,
endpoint);
    }

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Johan Oskarsson <johan@oskarsson.nu>                                            
                                                                 |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |cassandra-user@incubator.apache.org                                                   
                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |cassandra-dev@incubator.apache.org                                                    
                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |09/01/2009 12:49 PM                                                                   
                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: Cassandra + Hadoop + BMT                                                          
                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|






I have slapped together a basic Hadoop 0.18 CassandraOutputFormat based
on the code Chris put up.

Usage:
conf.setOutputKeyClass(RowColumn.class);
conf.setOutputValueClass(BytesWritable.class);

conf.setOutputFormat(CassandraOutputFormat.class);
conf.set(CassandraOutputFormat.CONF_COLUMN_FAMILY_NAME,
"columnfamilyname");
conf.set(CassandraOutputFormat.CONF_KEYSPACE, "keyspacename");

DistributedCache.addCacheFile(new URI("uri_to_storage-conf.xml"), conf);

+ your job specific settings.

Then after the job run this method: CassandraOutputFormat.forceFlush

Source code here:
http://github.com/johanoskarsson/cassandraoutputformat/tree/master

Big thanks to Chris for figuring out the mystery that is BinaryMemtable

/Johan

Chris Goffinet wrote:
> Hi Guys
>
> This is long overdue but I have posted a very rough rough example (with
> Digg stuff removed) for getting BMT working with Cassandra. Patches are
> coming next up for the JIRA tickets. I'll try to get a more generic
> map/reduce job finished by end of the week that integrates Hive output.
>
> http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master
>
> -Chris



Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message