cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vegard Berget" <p...@fantasista.no>
Subject Re: How long does it take for a write to actually happen?
Date Wed, 09 Jan 2013 14:32:33 GMT
Hi,
The timestamp is generated on the client side, so actually if you have
two clients which sets the timestamp from the system time, you will
experience trouble.  I don't know how Astyanax does it, and I am not
sure if it would cause trouble when getting data?  Could it be that
the Process server actually saw the information, but tried to update
with a lower timestamp - which then again means that failed - until 40
seconds had passed.  From
http://wiki.apache.org/cassandra/DataModel:"All values are supplied by
the client, including the 'timestamp'. This means that clocks on the
clients should be synchronized (in the Cassandra server environment is
useful also), as these timestamps are used for conflict resolution. In
many cases the 'timestamp' is not used in client applications, and it
becomes convenient to think of a column as a name/value pair. For the
remainder of this document, 'timestamps' will be elided for
readability. It is also worth noting the name and value are binary
values, although in many applications they are UTF8 serialized
strings."
.vegard, 
----- Original Message -----
From: user@cassandra.apache.org
To:
Cc:
Sent:Wed, 9 Jan 2013 15:56:08 +0200
Subject:Re: How long does it take for a write to actually happen?

Aaron, thanks a lot for you response! It gave us many ideas for future
re-factorings. 

Meanwhile, while trying to monitor Cassandra response times on all 3
servers (online, offline and cassandra itself), I have noticed that
the system time was different on all 3. After I ran ntpdate on all of
them, the problem was gone! The changes saved in Cassandra on offline
are immediately visible to online.

Unfortunately, I cannot explain, why system time on the client machine
matters, but I really hope that I have found the root cause of the
problem, and it is not just a coincidence that performance has
improved, after I have synched the times.

Best,
Vitaly Sourikov

On Wed, Jan 9, 2013 at 4:24 AM, aaron morton  wrote:
 EC2 m1.large nodeYou will have a much happier time if you use a
m1.xlarge.  
  We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"  Thats a pretty
low new heap size. 
  checks for new entries (in "Entries" CF, with indexed column
status=1), processes them, and sets the status to 2, when doneThis is
not the best data model.  You may be better have one CF for the
unprocessed and one for the process.  Or if you really need a queue
using something like Kafka.  
   I will appreciate any advice on how to speed the writes up,Writes
are instantly available for reading.  The first thing I would do is
see where the delay is. Use the nodetool cfstats to see the local
write latency, or track the write latency from the client
perspective.  
 If you are looking for near real time / continuous computation style
processing take a look at http://storm-project.net/ [2] and register
for this talk from a Brian O'Neill one of my fellow Data Stax
MVP's http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html
[3]  
 Cheers          ----------------- Aaron Morton Freelance Cassandra
Developer New Zealand 
 @aaronmorton http://www.thelastpickle.com [4]       
 On 9/01/2013, at 5:48 AM, Vitaly Sourikov  wrote: 
 Hi,
we are currently at an early stage of our project and have only one
Cassandra 1.1.7 node hosted on EC2 m1.large node, where the data is
written to the ephemeral disk, and /var/lib/cassandra/data is just a
soft link to it. Commit logs and caches are still on
/var/lib/cassandra/. We set MAX_HEAP_SIZE="6G" and
HEAP_NEWSIZE="400M"  

On the client-side, we use Astyanax 1.56.18 to access the data.  We
have a processing server that writes to Cassandra, and an online
server that reads from it. The former wakes up every 0.5-5sec., checks
for new entries (in "Entries" CF, with indexed column status=1),
processes them, and sets the status to 2, when done. The online server
checks once a second if an entry that should be processed got the
status 2 and sends it to its client side for display. Processing takes
5-10 seconds and updates various columns in the "Entries" CF few times
on the way. One of these columns may contain ~12KB of textual data,
others are just short strings or numbers.

Now, our problem is that it takes 20-40 seconds before the online
server actually sees the change - and it is way too long, this process
is supposed to be nearly real-time. Moreover, in sqlsh, if I perform a
similar update, it is immediately seen in the following select
results, but the updates from the back-end server also do not appear
for 20-40 seconds. 

I tried switching the row caches for that table and in yaml on and of.
I tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms:
50. Nothing helped. 

I will appreciate any advice on how to speed the writes up, or at
least an explanation why this happens.

thanks,
Vitaly

 

Links:
------
[1] mailto:aaron@thelastpickle.com
[2] http://storm-project.net/
[3]
http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html
[4] http://www.thelastpickle.com
[5] mailto:vitaly.sourikov@gmail.com


Mime
View raw message