cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Hiller <>
Subject Re: How to Optimizing Cassandra Updates -( Use of memtables)
Date Tue, 24 Jul 2012 16:32:37 GMT
I am guessing you already asked if they could give you three 100MB files
instead? so you could parallelize the operation.  or maybe your task
doesn't lend itself well to that.


On Tue, Jul 24, 2012 at 10:01 AM, Pushpalanka Jayawardhana <> wrote:

> Hi all,
> I am dealing with a scenario where I receive a .csv file in every 10mins
> intervals which is of average 300MB. I need to update a Cassandra cluster
> according to the received data from .csv file, after some processing
> functions.
> Current approach is keeping a Hashmap in memory, updating it from the
> processed .csv files gathering the data to be updated(This data is mostly a
> update on a counter). Then periodically(let's say in 2s intervals) the
> values in the Hashmap are read one by one again and updated in Cassandra.
> I have tried generating sstables and loading data as batches via
> sstableloader, but it is lot slower than the requirement that I need near
> real time results.
> Are there any hints on what I can try out? Is there any possibility to do
> something like directly updating values in a memtable (Instead of using
> Hashmap) and sending to Cassandra than loading via sstables?
> --
> Pushpalanka Jayawardhana

View raw message