Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77A5395B1 for ; Mon, 28 Nov 2011 13:45:11 +0000 (UTC) Received: (qmail 47811 invoked by uid 500); 28 Nov 2011 13:45:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47785 invoked by uid 500); 28 Nov 2011 13:45:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47777 invoked by uid 99); 28 Nov 2011 13:45:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2011 13:45:08 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.6.108.239] (HELO ponto.amerinoc.com) (64.6.108.239) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2011 13:45:00 +0000 Received: from fbsd8.localdomain (205.83.broadband7.iol.cz [88.102.83.205]) (authenticated bits=128) by ponto.amerinoc.com (8.14.5/8.14.5) with ESMTP id pASDiXwN039446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 28 Nov 2011 14:44:38 +0100 (CET) (envelope-from hsn@sendmail.cz) Received: from [127.0.0.1] ([10.0.0.1]) by fbsd8.localdomain (8.14.4/8.14.4) with ESMTP id pASDiQEl005551 for ; Mon, 28 Nov 2011 14:44:27 +0100 (CET) (envelope-from hsn@sendmail.cz) Message-ID: <4ED39035.1000606@sendmail.cz> Date: Mon, 28 Nov 2011 14:44:21 +0100 From: Radim Kolar User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: cassandra read performance on large dataset References: <1321754295.65683.YahooMailNeo@web161205.mail.bf1.yahoo.com> In-Reply-To: <1321754295.65683.YahooMailNeo@web161205.mail.bf1.yahoo.com> Content-Type: multipart/alternative; boundary="------------040802010209060601080009" X-Antivirus: avast! (VPS 111128-0, 28.11.2011), Outbound message X-Antivirus-Status: Clean This is a multi-part message in MIME format. --------------040802010209060601080009 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit > I understand that my computer may be not as powerful as those used in > the other benchmarks, > but it shouldn't be that far off (1:30), right? cassandra has very fast writes. you can have read:write ratios like 1:1000 pure read workload on 1 billion rows without key/row cache on 2 node cluster Running workload in 10 threads 1000 ops each. Workload took 88.59 seconds, thruput 112.88 ops/sec each node can do about 240 IOPS. Which means average 4 iops per read in cassandra on cold system. After OS cache warms enough to cache indirect seek blocks it gets faster to almost ideal: Workload took 79.76 seconds, thruput 200.59 ops/sec Ideal cassandra read performance is (without caches) is 2 IOPS per read -> one io to read index, second to data. pure write workload: Running workload in 40 threads 100000 ops each. Workload took 302.51 seconds, thruput 13222.62 ops/sec write is slow here because nodes are running out of memory most likely due to memory leaks in 1.0 branch. Also writes in this test are not batched. Cassandra is really awesome for its price tag. Getting similar numbers from Oracle will cost you way too much. For one 2 core Oracle licence suitable for processing large data you can get about 8 cassandra nodes - and dont forget that oracle needs some hardware too. Transactions are not always needed for data warehousing - if you are importing chunks of data, you do not need to do rollbacks, just schedule failed chunks for later processing. If you are able to code your app to work without transactions, cassandra is way to go. Hadoop and cassandra are very good products for working with large data basically for just price of learning new technology. Usually cassandra is deployed first, its easy to get it running and day-to-day operations are simple. Hadoop follows later after discovering that cassandra is not really suitable for large batch jobs because it needs random access for data reading. We finished processing migration from Commercional SQL to Hadoop/Cassa in 3 months, not only that it costs 10x less, we are able to process about 100 times larger datasets. Our largest dataset has 1200 billion rows. Problems with this setup are: bloom filters are using too much memory. they should be configurable for applications where read performance is unimportant node startup is really slow data loaded into cassandra are about 2 times bigger then CSV export. (not really problem, diskspace is cheap, but there is kinda high per row overhead) writing applications is harder then coding for SQL backend. Hadoop is way harder to use then cassandra. lack of good import/export tools for cassandra. especially lack of monitoring must have knowledge of workarounds for hadoop bugs. Hadoop is not easy to use efficiently. index overhead is too big (about 100% slower) compared to index overhead in SQL databases (about 20% slower) no delete over index repair is slow --------------040802010209060601080009 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
I understand that my computer may be not as powerful as those used in the other benchmarks, 
but it shouldn't be that far off (1:30), right?
cassandra has very fast writes. you can have read:write ratios like 1:1000

pure read workload on 1 billion rows without key/row cache on 2 node cluster
Running workload in 10 threads 1000 ops each.
Workload took 88.59 seconds, thruput 112.88 ops/sec

each node can do about 240 IOPS. Which means average 4 iops per read in cassandra on cold system.
After OS cache warms enough to cache indirect seek blocks it gets faster to almost ideal:
Workload took 79.76 seconds, thruput 200.59 ops/sec
Ideal cassandra read performance is (without caches) is 2 IOPS per read -> one io to read index, second to data.

pure write workload:
Running workload in 40 threads 100000 ops each.
Workload took 302.51 seconds, thruput 13222.62 ops/sec
write is slow here because nodes are running out of memory most likely due to memory leaks in 1.0 branch. Also writes in this test are not batched.

Cassandra is really awesome for its price tag. Getting similar numbers from Oracle will cost you way too much. For one 2 core Oracle licence suitable for processing large data you can get about  8 cassandra nodes - and dont forget that oracle needs some hardware too.  Transactions are not always needed for data warehousing - if you are importing chunks of data, you do not need to do rollbacks, just schedule  failed chunks for later processing. If you are able to code your app to work without transactions, cassandra is way to go.

Hadoop and cassandra are very good products for working with large data basically for just price of learning new technology. Usually cassandra is deployed first, its easy to get it running and day-to-day operations are simple. Hadoop follows later after discovering that cassandra is not really suitable for large batch jobs because it needs random access for data reading.

We finished processing migration from Commercional SQL to Hadoop/Cassa in 3 months, not only that it costs 10x less, we are able to process about 100 times larger datasets. Our largest dataset has 1200 billion rows.

Problems with this setup are:
   bloom filters are using too much memory. they should be configurable for applications where read performance is unimportant
   node startup is really slow
   data loaded into cassandra are about 2 times bigger then CSV export. (not really problem, diskspace is cheap, but there is kinda high per row overhead)
   writing applications is harder then coding for SQL backend. Hadoop is way harder to use then cassandra.
   lack of good import/export tools for cassandra. especially lack of monitoring
   must have knowledge of workarounds for hadoop bugs. Hadoop is not easy to use efficiently.
   index overhead is too big (about 100% slower) compared to index overhead in SQL databases (about 20% slower)
   no delete over index
   repair is slow

--------------040802010209060601080009--