cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: How long does it take for a write to actually happen?
Date Wed, 09 Jan 2013 02:24:55 GMT
> EC2 m1.large node
You will have a much happier time if you use a m1.xlarge. 

> We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"  
Thats a pretty low new heap size.

> checks for new entries (in "Entries" CF, with indexed column status=1), processes them,
and sets the status to 2, when done
This is not the best data model. 
You may be better have one CF for the unprocessed and one for the process. 
Or if you really need a queue using something like Kafka. 

> I will appreciate any advice on how to speed the writes up,
Writes are instantly available for reading. 
The first thing I would do is see where the delay is. Use the nodetool cfstats to see the
local write latency, or track the write latency from the client perspective. 

If you are looking for near real time / continuous computation style processing take a look
at http://storm-project.net/ and register for this talk from a Brian O'Neill one of my fellow
Data Stax MVP's http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 5:48 AM, Vitaly Sourikov <vitaly.sourikov@gmail.com> wrote:

> Hi,
> we are currently at an early stage of our project and have only one Cassandra 1.1.7 node
hosted on EC2 m1.large node, where the data is written to the ephemeral disk, and /var/lib/cassandra/data
is just a soft link to it. Commit logs and caches are still on /var/lib/cassandra/. We set
MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"  
> 
> On the client-side, we use Astyanax 1.56.18 to access the data.  We have a processing
server that writes to Cassandra, and an online server that reads from it. The former wakes
up every 0.5-5sec., checks for new entries (in "Entries" CF, with indexed column status=1),
processes them, and sets the status to 2, when done. The online server checks once a second
if an entry that should be processed got the status 2 and sends it to its client side for
display. Processing takes 5-10 seconds and updates various columns in the "Entries" CF few
times on the way. One of these columns may contain ~12KB of textual data, others are just
short strings or numbers.
> 
> Now, our problem is that it takes 20-40 seconds before the online server actually sees
the change - and it is way too long, this process is supposed to be nearly real-time. Moreover,
in sqlsh, if I perform a similar update, it is immediately seen in the following select results,
but the updates from the back-end server also do not appear for 20-40 seconds. 
> 
> I tried switching the row caches for that table and in yaml on and of. I tried commitlog_sync:
batch with commitlog_sync_batch_window_in_ms: 50. Nothing helped. 
> 
> I will appreciate any advice on how to speed the writes up, or at least an explanation
why this happens.
> 
> thanks,
> Vitaly


Mime
View raw message