cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utku Can Top├žu <u...@topcu.gen.tr>
Subject Re: Anyone using hadoop/MapReduce integration currently?
Date Tue, 25 May 2010 17:50:31 GMT
Hi Jeremy,

> Why are you using Cassandra versus using data stored in HDFS or HBase?
- I'm thinking of using it for realtime streaming of user data. While
streaming the requests, I'm also using Lucandra for indexing the data in
realtime. It's a better option when you compare it with HBase or the native
HDFS flat files, because of low latency in writes.

> Is there anything holding you back from using it (if you would like to use
it but currently cannot)?

My answer to this would be:
- The current integration only supports the whole range of the CF to be
input for the map phase, it would be way much better if the InputFormat had
means of support for a KeyRange.

Best Regards,
Utku

On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna <jeremy.hanna1234@gmail.com>wrote:

> I'll be doing a presentation on Cassandra's (0.6+) hadoop integration next
> week. Is anyone currently using MapReduce or the initial Pig integration?
>
> (If you're unaware of such integration, see
> http://wiki.apache.org/cassandra/HadoopSupport)
>
> If so, could you post to this thread on how you're using it or planning on
> using it (if not covered by the shroud of secrecy)?
>
> e.g.
> What is the use case?
>
> Why are you using Cassandra versus using data stored in HDFS or HBase?
>
> Are you using a separate Hadoop cluster to run the MR jobs on, or perhaps
> are you running the Job Tracker and Task Trackers on Cassandra nodes?
>
> Is there anything holding you back from using it (if you would like to use
> it but currently cannot)?
>
> Thanks!

Mime
View raw message