cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hodges (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-342) hadoop integration
Date Thu, 20 Aug 2009 08:20:14 GMT


Jeff Hodges commented on CASSANDRA-342:

So, my biggest problem with this patch right now is the boot up code and the way it combines
with the local-only query code. It forces us into booting a brand new cassandra instance that
assumes the data is already there and ready for the taking but only when a MapReduce task
is being done. This is all sorts of bad news. 

There does not seem to be a way of getting to the internals of Cassandra we need (reading
from and writing to the disk and memtable, figuring out what keys are on what nodes, etc.)
without also having to boot all of the various Cassandra services. 

I'm looking for input on how we can get around that. 

FYI, the HBase way is to have HBase running on the machine already and throw up a connection
to it from another process that is created with the information from the InputSplit (on the
map task machines) and from the config files (on the initial machine that creates the InputSplits).

> hadoop integration
> ------------------
>                 Key: CASSANDRA-342
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-342.-Set-up-for-the-hadoop-commits.patch, 0001-the-stupid-version-of-hadoop-support.patch,
0002-CASSANDRA-342.-Working-hadoop-support.patch, 0003-CASSNADRA-342.-Adding-the-WordCount-example.patch,
> Some discussion on -dev:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message