Thanks Stu. I will take a look at Hector. Do you know where the
input code does the additional work?
On Mon, Apr 19, 2010 at 11:20 AM, Stu Hood <stu.hood@rackspace.com> wrote:
> If you used that snippet of code, all connections would go through the same seed: the
input code does additional work to determine which nodes are holding particular key ranges,
and then connects directly.
>
> ----
>
> For outputting from Hadoop to Cassandra, you may want to consider using a Java client
like Hector, which will handle the load balancing for you.
>
> http://github.com/rantav/hector
>
> Thanks,
> Stu
>
> -----Original Message-----
> From: "Sonny Heer" <sonnyheer@gmail.com>
> Sent: Monday, April 19, 2010 11:29am
> To: cassandra-user@incubator.apache.org
> Subject: Map/Reduce Cassandra Output
>
> Different from the wordcount my input source is a directory, and I
> have the a split class and record reader defined.
>
> Different from wordcount during reduce I need to insert into
> Cassandra. I notice for the wordcount input it retrieves a handle on
> a cassandra client like this:
>
> TSocket socket = new
> TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(),
> DatabaseDescriptor.getThriftPort());
> TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket,
> false, false);
> Cassandra.Client client = new Cassandra.Client(binaryProtocol);
>
> Would all hadoop nodes go to the same seed if i use this code to
> insert data, without balancing it? Has this been done somewhere in
> the Cassandra code already?
>
>
>
|