incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse McConnell <jesse.mcconn...@gmail.com>
Subject Re: Updating (as opposed to just setting) Cassandra data via Hadoop
Date Fri, 07 May 2010 14:13:08 GMT
does anyone have a feel for how performant m/r operations are when
backed by cassandra as opposed to hdfs in terms of network utilization
and volume of data being pushed around?

jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Fri, May 7, 2010 at 08:54, Ian Kallen <spidaman.list@gmail.com> wrote:
> On 5/6/10 3:26 PM, Stu Hood wrote:
>>
>> Ian: I think that as get_range_slice gets faster, the approach that Mark
>> was heading toward may be considerably more efficient than reading the old
>> value in the OutputFormat.
>>
>
> Interesting, I'm trying to understand the performance impact of the
> different ways to do this. Under Mark's approach, the prior values are
> pulled out of Cassandra in the mapper in bulk, then merged and written back
> to Cassandra in the reducer; the get_range_slice is faster than the
> individual row fetches that my approach does in the reducer. Is that what
> you mean or are you referring to something else?
> thanks!
> -Ian
>
> --
> Ian Kallen
> blog: http://www.arachna.com/roller/spidaman
> tweetz: http://twitter.com/spidaman
> vox: 925.385.8426
>
>
>

Mime
View raw message