incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikael Wikblom <mikael.wikb...@sitevision.se>
Subject Re: CL - locally consistent ONE
Date Fri, 28 Oct 2011 13:47:59 GMT
On 10/28/2011 03:21 PM, Peter Schuller wrote:
>> During tests I've done mass mutations using an import of data. Using
>> CL.QUORUM the import takes around 3 times longer that using CL.ONE on a
>> cluster with 3 nodes.
> Is the test sequential or multi-threaded? A factor 3 performance
> difference seems like a lot in terms of total throughput; but it's
> easier to buy if it is due to higher latency resulting from use of
> QUORUM.

The import is single threaded and behavior is as expected (CL.ONE is 
much faster). The normal scenario is more multi-threaded in that sense 
that the servlet container serves multiple request on multiple threads. 
The problem with QUORUM reads and writes will not be as big in that 
case. But due to the design of the framework rendering a single page may 
result in several hundred reads which will then be sequential for the 
request.
> In other words: What I'm saying is that assuming bottlenecks are are
> being saturated (sufficient concurrency, etc) I would expect that
> CL.ONE and CL.QUORUM be roughly similar in their *throughput* (writes
> per second). However for a single sequential client, or fewer
> concurrent clients than necessary, it makes sense to see a
> significantly worse throughput as a result of higher latency, as low
> concurrency means you're not saturating the system.
>
> If you're really seeing a 3x difference with concurrency cranked up
> that might be worth investigating IMO.
>
>> I think this is an expected behavior though, in the CL.ONE case more or less
>> all reads / writes will be done locally whereas using QUORUM all operations
>> are done locally and remotely.
> It's expected that latency on an individual request will be higher
> because (1) you are doing another level of network round-trips, and
> (2) there is always variation in latency and having to wait for more
> than one means a higher probability of having to wait a bit longer
> just for that reason.
>
> If you use case is that you're e.g. doing a few writes, a few reads,
> and returning a page - higher latencies on these reads and writes are
> expected to translate to slower page (or whatever) load times. But - I
> still think that the overall throughput, as long as concurrency is
> high enough, should be roughly similar for ONE and QUORUOM *for
> writes*. For reads, CL.ONE is definitely expected to be faster even in
> terms of throughput, since you are in fact doing 1/RF times the amount
> of reads.
>
>> Giving the locally consistent ONE approach we avoid consistency problems
>> when the local node is slow, and the import time is more or less the same as
>> with CL.ONE. We do however introduce potentially slower  reads / writes when
>> the response time of local node it slower that any of the remote nodes,  but
>> because cassandra is embedded in SiteVision the entire jvm will probably
>> react slowly at this stage - a fast read from a remote node will not help
>> much.
> Makes sense.
>
>> An other problem with using QUORUM is that it does not scale well in the
>> case of big production environments. There are cases when a customer
>> temporarily uses up to 10 nodes where QUORUM would mean reading / writing to
>> 6 nodes.
> So writes would still be expensive, but CL.ONE should get you 1/6 the
> cost (CPU/disk) related to QUORUM in this case.
>
> I do want to point out though that it is somewhat of a fundamental
> limitation as long as you tie RF to cluster size, and will become a
> scalability problem even if you use CL.ONE due to reads (and for that
> matter due to repair becoming more expensive).
yes very true. It should be said here that a normal SiteVision database 
is rather small for a cassandra database. It is usually not bigger than 
a few GB (up to 40 GB).
> If I may ask, what are you primarly targeting here - is it about
> shaving a single millisecond or two off of the average request, or is
> it the number of requests per second in total for a single "task"
> (page load, etc), or is it overall throughput of the system (total
> amount of h/w required)?
>
> I'm mostly trying to figure out whether you are having some problem
> with latency or throughput that is not just a result of an extra few
> milli seconds of an additional round-trip time. If there is some other
> solution to the problem, the patching you describe, and limiting
> yourself in scaling by fixing RF=number of nodes, seems like kind of a
> high price to pay.
yes it is a high price, but than again, the databases are not really big
> Obviously I don't know fully your situation so take that with a grain
> of salt. Just my knee-jerk reaction from what I've read so far.
Yes. Maybe it is time to investigate the possibilities to change the 
store / retrieval patterns of our framework to match cassandra instead 
of the other way around :)

Thanks again for your time
Kind regards

-- 
Mikael Wikblom
Software Architect
SiteVision AB
019-217058
mikael.wikblom@sitevision.se
http://www.sitevision.se


Mime
View raw message