incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johan Oskarsson <jo...@oskarsson.nu>
Subject Re: Cassandra + Hadoop + BMT
Date Wed, 09 Sep 2009 09:49:02 GMT
In my version of the code the storage endpoints are pulled out from a
seed node using the NodeProbe class and then put into the StorageService
using the updateTokenMetadata method.

See updateTokenMetadata in CassandraClient:
http://github.com/johanoskarsson/cassandraoutputformat/blob/dfa4dbf9b1bc81854b492af14536693002e19e52/src/java/fm/last/hadoop/mapred/CassandraClient.java

Granted it's not a perfect solution.

/Johan

Jun Rao wrote:
> I was trying to understand how the MapReduce job figures out where a row
> is located in a cassandra cluster and I saw the following code. Does
> this really work? To compute the proper endpoints, the StorageService
> needs to be started to obtain all tokens from other nodes through
> gossip. However, StorageService is not started in the MapReduce job.
> 
>     for (EndPoint endpoint :
> StorageService.instance().getReadStorageEndPoints(rowKey)) {
>       /* Send message to end point */
>       MessagingService.getMessagingInstance().sendOneWay(message, endpoint);
>     }
> 
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
> 
> junrao@almaden.ibm.com
> 
> 
> Inactive hide details for Johan Oskarsson ---09/01/2009 12:49:28 PM---I
> have slapped together a basic Hadoop 0.18 CassandraOutpJohan Oskarsson
> ---09/01/2009 12:49:28 PM---I have slapped together a basic Hadoop 0.18
> CassandraOutputFormat based on the code Chris put up.
> 
> 
> From:	
> Johan Oskarsson <johan@oskarsson.nu>
> 
> To:	
> cassandra-user@incubator.apache.org
> 
> Cc:	
> cassandra-dev@incubator.apache.org
> 
> Date:	
> 09/01/2009 12:49 PM
> 
> Subject:	
> Re: Cassandra + Hadoop + BMT
> 
> ------------------------------------------------------------------------
> 
> 
> 
> 
> I have slapped together a basic Hadoop 0.18 CassandraOutputFormat based
> on the code Chris put up.
> 
> Usage:
> conf.setOutputKeyClass(RowColumn.class);
> conf.setOutputValueClass(BytesWritable.class);
> 
> conf.setOutputFormat(CassandraOutputFormat.class);
> conf.set(CassandraOutputFormat.CONF_COLUMN_FAMILY_NAME, "columnfamilyname");
> conf.set(CassandraOutputFormat.CONF_KEYSPACE, "keyspacename");
> 
> DistributedCache.addCacheFile(new URI("uri_to_storage-conf.xml"), conf);
> 
> + your job specific settings.
> 
> Then after the job run this method: CassandraOutputFormat.forceFlush
> 
> Source code here:
> http://github.com/johanoskarsson/cassandraoutputformat/tree/master
> 
> Big thanks to Chris for figuring out the mystery that is BinaryMemtable
> 
> /Johan
> 
> Chris Goffinet wrote:
>> Hi Guys
>>
>> This is long overdue but I have posted a very rough rough example (with
>> Digg stuff removed) for getting BMT working with Cassandra. Patches are
>> coming next up for the JIRA tickets. I'll try to get a more generic
>> map/reduce job finished by end of the week that integrates Hive output.
>>
>> http://github.com/lenn0x/Cassandra-Hadoop-BMT/tree/master
>>
>> -Chris
> 
> 
> 


Mime
View raw message