Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 40198 invoked from network); 23 Nov 2010 01:30:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Nov 2010 01:30:47 -0000 Received: (qmail 5736 invoked by uid 500); 23 Nov 2010 01:31:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5656 invoked by uid 500); 23 Nov 2010 01:31:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5645 invoked by uid 99); 23 Nov 2010 01:31:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 01:31:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davidj@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 01:31:08 +0000 Received: by wyb29 with SMTP id 29so7753231wyb.31 for ; Mon, 22 Nov 2010 17:30:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=PBrjAm5FZmauKMqmX2fF9l1UHK/aYG8FMw8C1gmeszQ=; b=iOHDHiZoylSRmkZJFGBDNIa/QC8BbCXoYjCourphGAYRodDWEU0/p3gyx732ht2kG3 G7ZV5NNCr8o6m7YJEkKBct5Twe9re4apXiwudQTVJqAODgeeEQro/R8cgs33Oc/kCt6X +O8urM6gNyP9UsbkIDqDy2DHsqjCsT9nkwfNw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Ru2nl7ZrE+ACMbwBvmR+T7NlFQEpsLrxDLRNo3cTc8lkuTIfbLRl8RXDortei2bh9D 14o4Hn2MoRkaLA4/uyHy4T7dADlY51ooUkXuN4ecmXfnWEINN5mGJWTwjg0xr4Uf7lfh JjYAzlxAF/MWHSjPjMWshN660CCT5Q9OKiWF8= MIME-Version: 1.0 Received: by 10.216.90.132 with SMTP id e4mr5723974wef.73.1290475847968; Mon, 22 Nov 2010 17:30:47 -0800 (PST) Received: by 10.216.240.70 with HTTP; Mon, 22 Nov 2010 17:30:47 -0800 (PST) In-Reply-To: References: Date: Mon, 22 Nov 2010 17:30:47 -0800 Message-ID: Subject: Re: cassandra vs hbase summary (was facebook messaging) From: David Jeske To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6dab1788266120495ae523b X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dab1788266120495ae523b Content-Type: text/plain; charset=ISO-8859-1 > My point still applies though. Caching HFIle blocks on a single node >> vs individual "dataums" on N nodes may not be more efficient. Thus >> terms like "Slower" and "Less Efficient" could be very misleading. >> > I seem to have missed this the first time around. Next time I correct the summary I'll include something about the subtlties of block vs record caching. If you access sparse/random rows, and rows are small, the record caching on multiple machines may in fact be more efficicent than block caching on fewer machines. That said, the story for pinning ranges of data in memory doesn't seem to change. Another interesting difference has to do with scan vs seek performance. There was one comment about cassandra possibly having better seek performance than hbase because of some hdfs slowness, which was then rumored to be in the works to fix. Anyone have any other comments about scan or seek performance comparisons? Again, I understand Cassandra is not HBase. However, it's useful to be able to compare them (and their designs), so people can understand what might help them choose one over the other. Thanks again! --0016e6dab1788266120495ae523b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

My point still applies though. Caching HFIle blocks on a single node
vs individual "dataums" on N nodes may not be more efficient. Thu= s
terms like "Slower" and "Less Efficient" could be very = misleading.

I = seem to have missed this the first time around. Next time I correct the sum= mary I'll include something about the subtlties of block vs record cach= ing. If you access sparse/random rows, and rows are small, the record cachi= ng on multiple machines may in fact be more efficicent than block caching o= n fewer machines.=A0

That said, the story for pinning ranges of data in memo= ry doesn't seem to change.

Another interesting= difference has to do with scan vs seek performance. There was one comment = about cassandra possibly having better seek performance than hbase because = of some hdfs slowness, which was then rumored to be in the works to fix. An= yone have any other comments about scan or seek performance comparisons?=A0=

Again, I understand Cassandra is not HBase. However, it's usef= ul to be able to compare them (and their designs), so people can understand= what might help them choose one over the other. =A0Thanks again!
--0016e6dab1788266120495ae523b--