Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.161.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=Ti67u4XPzqz4WTXaB8YgOOY2Q27Y9RAemAh2OpdcnfAvenS+yJ2P+G7tRg9hYTfVjm
         mD+R2kMCAXRghcrZ8n2Aqkj73pN2c1ybqwTfv7s70YphjijIEfgj6LbDjo+wL/C8dXtM
         oK++R8WR/jZ1115gzxb/UC/vZEmrsRr0jiQTg=
MIME-Version: 1.0
In-Reply-To: <AANLkTimRPK8kyaTQNGyh4Cp8CY3NcfKy0KMo1L5Cbg3G@mail.gmail.com>
References: <AANLkTimuCNP1+jK9hDn=_Uf7henwQPa5qQco6KbHHf98@mail.gmail.com>
	<AANLkTi=8oBpuVqoqYepdcxsGGZB54JLw+u8iOaVDfLUy@mail.gmail.com>
	<AANLkTi=pzoEybo-7iHVaN=L+wBsmr65g-uP1oKbHi8fM@mail.gmail.com>
	<AANLkTimRPK8kyaTQNGyh4Cp8CY3NcfKy0KMo1L5Cbg3G@mail.gmail.com>
Date: Mon, 22 Nov 2010 17:39:57 -0500
Message-ID: <AANLkTikot8E_JaczT4=qNrSYTkDPbK6Y19x53wC+zpu8@mail.gmail.com>
Subject: Re: cassandra vs hbase summary (was facebook messaging)
From: Edward Capriolo <edlinuxguru@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Nov 22, 2010 at 5:14 PM, Todd Lipcon <todd@lipcon.org> wrote:
> On Mon, Nov 22, 2010 at 1:58 PM, David Jeske <davidj@gmail.com> wrote:
>>
>> On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon <todd@lipcon.org> wrote:
>>>
>>> Not quite. The replica synchronization code is pretty messy, but
>>> basically it will take the longest replica that may have been synced, not a
>>> quorum.
>>> i.e the guarantee is that "if you successfully sync() data, it will be
>>> present after replica synchronization". Unsynced data *may* be present after
>>> replica synchronization.
>>> But keep in mind that recovery is blocking in most cases - ie if the RS
>>> is writing to a pipeline and waiting on acks, and one of the nodes in the
>>> pipeline dies, then it will recover the pipeline (without the dead node) and
>>> continue syncing to the remaining two nodes. The client is still blocked at
>>> this point.
>>
>> I see. So it sounds like my statement #1 was wrong. Will the RS ever
>> timeout the write and fail in the face of not being able to push it to HDFS?
>> Is it correct to say:
>> Once a write is issued to HBase, it will either catistrophicly fail (i.e.
>> disconnect), in which case the write with either have failed or succeeded,
>> and if it succeeded, future reads will always show that write? As opposed to
>> Cassandra, which in all configurations where reads allow a subset of all
>> nodes, can "fail" a write while having the write show a temporary period of
>> inconsistency (depending on who you talk to) followed by the write either
>> applying or not applying depending on whether or not it actually wrote a
>> single node during the "failure to meet the write consistency request"?
>
> Yes, this seems accurate to me.
>
>>
>> Does Cassandra have any return result which distinguishes between these
>> two states:
>> 1 - your data was not written to any nodes (true failure)
>> 2 - your data was written to at least 1 node, but not enough to meet your
>> write-consistency count
>> ?
>>
>>
>
>


David ,
Return messages such as "your data was written to at least 1 node but
not enough to make your write-consistency count". Do not help the
situation. As the client that writes the data would be aware of the
inconsistency, but the other clients would not. Thus it only makes
sense to pass or fail entirely. (Thought it could be an interesting
error message)

Right, CASSANDRA-1314 only solves the memory overhead issue.

Another twist to throw in the "losing writes conversation" is that
file systems can lose writes as well :) Unless you are choosing many
synchronous options that most people do not use (IMHO)

@Todd. Good catch about caching HFile blocks.

My point still applies though. Caching HFIle blocks on a single node
vs individual "dataums" on N nodes may not be more efficient. Thus
terms like "Slower" and "Less Efficient" could be very misleading.

Isn't caching only the item more efficient? In cases with high random
read is evicting single keys more efficient then evicting blocks in
terms of memory churn?

These are difficult questions to answer absolutely so seeing bullet
points such as '#Cassandra has slower this' are oversimplifications of
complex problems.