Hi Jeremy,

> Why are you using Cassandra versus using data stored in HDFS or HBase?
- I'm thinking of using it for realtime streaming of user data. While streaming the requests, I'm also using Lucandra for indexing the data in realtime. It's a better option when you compare it with HBase or the native HDFS flat files, because of low latency in writes.

> Is there anything holding you back from using it (if you would like to use it but currently cannot)?

My answer to this would be:
- The current integration only supports the whole range of the CF to be input for the map phase, it would be way much better if the InputFormat had means of support for a KeyRange.

Best Regards,
Utku

On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna <jeremy.hanna1234@gmail.com> wrote:
I'll be doing a presentation on Cassandra's (0.6+) hadoop integration next week. Is anyone currently using MapReduce or the initial Pig integration?

(If you're unaware of such integration, see http://wiki.apache.org/cassandra/HadoopSupport)

If so, could you post to this thread on how you're using it or planning on using it (if not covered by the shroud of secrecy)?

e.g.
What is the use case?

Why are you using Cassandra versus using data stored in HDFS or HBase?

Are you using a separate Hadoop cluster to run the MR jobs on, or perhaps are you running the Job Tracker and Task Trackers on Cassandra nodes?

Is there anything holding you back from using it (if you would like to use it but currently cannot)?

Thanks!