If you used that snippet of code, all connections would go through the same seed: the input
code does additional work to determine which nodes are holding particular key ranges, and
then connects directly.
----
For outputting from Hadoop to Cassandra, you may want to consider using a Java client like
Hector, which will handle the load balancing for you.
http://github.com/rantav/hector
Thanks,
Stu
-----Original Message-----
From: "Sonny Heer" <sonnyheer@gmail.com>
Sent: Monday, April 19, 2010 11:29am
To: cassandra-user@incubator.apache.org
Subject: Map/Reduce Cassandra Output
Different from the wordcount my input source is a directory, and I
have the a split class and record reader defined.
Different from wordcount during reduce I need to insert into
Cassandra. I notice for the wordcount input it retrieves a handle on
a cassandra client like this:
TSocket socket = new
TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(),
DatabaseDescriptor.getThriftPort());
TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket,
false, false);
Cassandra.Client client = new Cassandra.Client(binaryProtocol);
Would all hadoop nodes go to the same seed if i use this code to
insert data, without balancing it? Has this been done somewhere in
the Cassandra code already?
|