Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C275086FE for ; Wed, 24 Aug 2011 22:12:39 +0000 (UTC) Received: (qmail 23973 invoked by uid 500); 24 Aug 2011 22:12:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23885 invoked by uid 500); 24 Aug 2011 22:12:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23866 invoked by uid 99); 24 Aug 2011 22:12:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 22:12:36 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ryan@twitter.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Aug 2011 22:12:31 +0000 Received: by vws12 with SMTP id 12so1733231vws.31 for ; Wed, 24 Aug 2011 15:12:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=vL+zvaE+h3SkeGJHMnLQ6qJirrfarY7LXps6dJopz0E=; b=FlmrhANNSqfgSLjlCpwJMufZADmCQ2b0PWn/+Cthy8iOL5s6nu82IhDEXZA6mEj+ie wK22eb6nbSncrpSUV15VMIYJ4miJGT863PdJnQ2rKjH7FUKHISQZRy4DmeWQbibHd9nq eIsLlxLTqTsdyxSIgR+H/WE4CSO+OL3/xEZHA= Received: by 10.52.100.105 with SMTP id ex9mr5930684vdb.149.1314223930186; Wed, 24 Aug 2011 15:12:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.159.161 with HTTP; Wed, 24 Aug 2011 15:11:50 -0700 (PDT) In-Reply-To: <089E6C9A-621F-4271-92E4-CC4989BEDE1E@gmail.com> References: <089E6C9A-621F-4271-92E4-CC4989BEDE1E@gmail.com> From: Ryan King Date: Wed, 24 Aug 2011 15:11:50 -0700 Message-ID: Subject: =?UTF-8?Q?Re=3A_Memory_overhead_of_vector_clocks=E2=80=A6=2E_how_often_a?= =?UTF-8?Q?re_they_pruned=3F?= To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We did have a Clock construct for awhile, but it never made it into a released version (afaik). We though about using them for counters. Timestamps are endemic to the data model and therefore can never be pruned. Cassandra basically trades memory for availability here. -ryan On Wed, Aug 24, 2011 at 10:54 AM, Jeremy Hanna wrote: > At the point that book was written (about a year ago it was finalized), v= ector clocks were planned. =C2=A0In August or September of last year, they = were removed. =C2=A00.7 was released in January. =C2=A0The ticket for vecto= r clocks is here and you can see the reasoning for not using them at the bo= ttom. =C2=A0https://issues.apache.org/jira/browse/CASSANDRA-580 > > On Aug 24, 2011, at 12:41 PM, Kevin Burton wrote: > >> This is really interesting=E2=80=A6 I can track it down but there are a = number of references to Cassandra HAVING vector clocks =E2=80=A6 which woul= d make sense that I can't find out how much memory they are using :-P >> >> "Cassandra: The Definitive Guide" =E2=80=A6 which I was reading the othe= r night says that they were introduced in 0.7 but that they're still figuri= ng out what to do with them: >> >> http://books.google.com/books?id=3DMKGSbCbEdg0C&pg=3DPA50&lpg=3DPA50&dq= =3DCassandra's+clock+was+introduced+in+version+0.7,+but+its+fate+is+uncerta= in&source=3Dbl&ots=3DXoQz3tFa1C&sig=3DLhdu5j1xRcTPmP4-YQONhxzfRTU&hl=3Den&e= i=3DMzdVTurWEJTSiAKU5vXoDA&sa=3DX&oi=3Dbook_result&ct=3Dresult&resnum=3D1&v= ed=3D0CBkQ6AEwAA#v=3Donepage&q&f=3Dfalse >> >> =E2=80=A6 so=E2=80=A6 are 'timestamps' pruned? >> >> Even this mechanism seems like it will dominate the amount of memory use= d in Cassandra. =C2=A0I could see many installs requiring 2-3x more memory = to run Cassandra unless there is a pruning mechanism or some way to minimiz= e their use. >> >> Kevin >> >> >> On Wed, Aug 24, 2011 at 9:05 AM, Ryan King wrote: >> On Tue, Aug 23, 2011 at 7:58 PM, Kevin Burton wrote= : >> I had a thread going the other day about vector clock memory usage and t= hat it is a series of (clock id, clock):ts and the ability to prune old ent= ries =E2=80=A6 I'm specifically curious here how often old entries are prun= ed. >> >> If you're storing small columns within cassandra. =C2=A0Say just an inte= ger. =C2=A0The vector clock overhead could easily use up far more data than= is actually in your database. >> >> However, if they are pruned, then this shouldn't really be a problem. >> >> How much memory is this wasting? >> >> I think there is some confusion here=E2=80=93 cassandra doesn't use vect= or clocks. >> >> -ryan >> >> Thoughts? >> >> >> Jonathan Ellis jbellis@gmail.com to user >> show details Aug 19 (4 days ago) >> The problem with naive last write wins is that writes don't always >> arrive at each replica in the same order. =C2=A0So no, that's a >> non-starter. >> >> Vector clocks are a series of (client id, clock) entries, and usually >> a timestamp so you can prune old entries. =C2=A0Obviously implementation= s >> can vary, but to pick a specific example, Voldemort [1] uses 2 bytes >> per client id, a variable number (at least one) of bytes for the >> clock, and 8 bytes for the timestamp. >> >> [1] https://github.com/voldemort/voldemort/blob/master/src/java/voldemor= t/versioning/VectorClock.java >> >> >> -- >> Founder/CEO Spinn3r.com >> >> Location: San Francisco, CA >> Skype: burtonator >> Skype-in: (415) 871-0687 >> >> >> >> >> >> -- >> Founder/CEO Spinn3r.com >> >> Location: San Francisco, CA >> Skype: burtonator >> Skype-in: (415) 871-0687 >> > >