cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Dahlke <drew.dah...@bronto.com>
Subject Re: Map Reduce support
Date Mon, 28 Jun 2010 13:32:40 GMT
I'm afraid I didn't hold on to it, sorry folks

On Mon, Jun 28, 2010 at 8:58 AM, Carlos Sanchez
<carlos.sanchez@riskmetrics.com> wrote:
> Drew,
>
> I was wondering if you care to share your map-reduce code
>
> Thanks
>
> Carlos
> ________________________________________
> From: Drew Dahlke [drew.dahlke@bronto.com]
> Sent: Monday, June 28, 2010 7:17 AM
> To: user@cassandra.apache.org
> Subject: Re: Map Reduce support
>
> The difference is noticeable but small. I did a test just reading data
> in from Cassandra on our cluster & dumping it to a csv file. Pure map
> reduce was going at ~17k records/sec versus ~15k from Pig. There is
> overhead to using Pig, but it'll reduce your development time & make
> for more readable code if it suits your needs.
>
> On Sun, Jun 27, 2010 at 9:53 AM, Atul Gosain <atul.gosain@gmail.com> wrote:
>> Thanks for the information Drew and Jonathan.
>> Is there any difference in performance while using Pig compared to MapReduce
>> directly on data store ?
>> I will do the experiments with both of them though in some time.
>>
>> On Fri, Jun 25, 2010 at 5:46 PM, Drew Dahlke <drew.dahlke@bronto.com> wrote:
>>>
>>> The cassandra column family input format will go over a an entire
>>> column family sending a slice of a row into a mapper at a time. From
>>> there there's a lot you can do. As far as how you aggregate data
>>> together, I'd suggest experimenting with the latest version of Pig
>>> which thankfully supports the new input format. It gives you a
>>> SQL'esque syntax for manipulating the data and is probably the easiest
>>> way to experiment.
>>>
>>> On Thu, Jun 24, 2010 at 11:01 AM, Atul Gosain <atul.gosain@gmail.com>
>>> wrote:
>>> > Hi
>>> >   What kind of Map Reduce support is provided for Cassandra ?
>>> > Can i get some columns from different rows and then aggregate them up
>>> > together. Its basically aggregation of statistics for various devices
>>> > connected to a network manager. Is it a right kind of use case to be
>>> > supported by MR ?
>>> > Thanks
>>> > Atul
>>
>>
>
> This email message and any attachments are for the sole use of the intended recipients
and may contain proprietary and/or confidential information which may be privileged or otherwise
protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not an intended recipient, please contact the sender by reply email and destroy
the original message and any copies of the message as well as any attachments to the original
message.
>

Mime
View raw message