cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject RE: Map/Reduce Cassandra Output
Date Mon, 19 Apr 2010 18:20:08 GMT
If you used that snippet of code, all connections would go through the same seed: the input
code does additional work to determine which nodes are holding particular key ranges, and
then connects directly.

----

For outputting from Hadoop to Cassandra, you may want to consider using a Java client like
Hector, which will handle the load balancing for you.

http://github.com/rantav/hector

Thanks,
Stu

-----Original Message-----
From: "Sonny Heer" <sonnyheer@gmail.com>
Sent: Monday, April 19, 2010 11:29am
To: cassandra-user@incubator.apache.org
Subject: Map/Reduce Cassandra Output

Different from the wordcount my input source is a directory, and I
have the a split class and record reader defined.

Different from wordcount during reduce I need to insert into
Cassandra.  I notice for the wordcount input it retrieves a handle on
a cassandra client like this:

        TSocket socket = new
TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(),
                                     DatabaseDescriptor.getThriftPort());
        TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket,
false, false);
        Cassandra.Client client = new Cassandra.Client(binaryProtocol);

Would all hadoop nodes go to the same seed if i use this code to
insert data, without balancing it?  Has this been done somewhere in
the Cassandra code already?



Mime
View raw message