From user-return-18723-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Jul 13 20:31:19 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 263AA6B7D for ; Wed, 13 Jul 2011 20:31:19 +0000 (UTC) Received: (qmail 37246 invoked by uid 500); 13 Jul 2011 20:31:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37126 invoked by uid 500); 13 Jul 2011 20:31:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37110 invoked by uid 99); 13 Jul 2011 20:31:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2011 20:31:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a40.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2011 20:31:07 +0000 Received: from homiemail-a40.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a40.g.dreamhost.com (Postfix) with ESMTP id 4272574C058 for ; Wed, 13 Jul 2011 13:30:43 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=subject :references:from:content-type:in-reply-to:message-id:date:to :content-transfer-encoding:mime-version; q=dns; s= thelastpickle.com; b=0BT8TDz+ltMhKMZ48z+x196askspz7A3BHZfTbkBfDk osilf/KhnHWo/Ifs7M6bj8t1Pe6rjdX+Z2+a8F1TQ/ycayhgpCKpTG5oQeTLLyQm 8ZoIWgUMxHBUK5OEIqA+RPWmHjuvUvPOVrVyqzTZY8o51w1S/7mp+dXB0klucvu8 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= subject:references:from:content-type:in-reply-to:message-id:date :to:content-transfer-encoding:mime-version; s=thelastpickle.com; bh=RkH/KcUc+VZNp5f75K6nFwHghNw=; b=BFU5LEJxunXX2j5aYKdMH0ar387h CAYXAwJxnKqZG6KDvhcn6owWlBopZGogs7UEDdG2vJ7jXmoH0B9/evLfgVzXmtow QGV4L7tJloYAzZcnHVzQ7LhSZVXmyc29OdUAp+SSU1J+Ryz6EIXGm2VSiDBDHZ5C H35HUlQKoPX6WO0= Received: from [115.189.217.204] (unknown [115.189.217.204]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a40.g.dreamhost.com (Postfix) with ESMTPSA id 98F7774C057 for ; Wed, 13 Jul 2011 13:30:39 -0700 (PDT) Subject: Re: commitlog replay missing data References: <008BB2800AE49940817156C538647C930A0E1A57@ex01-west.YOJOE.local> From: Aaron Morton Content-Type: multipart/alternative; boundary=Apple-Mail-2--579319862 In-Reply-To: <008BB2800AE49940817156C538647C930A0E1A57@ex01-west.YOJOE.local> Message-Id: Date: Thu, 14 Jul 2011 08:10:44 +1200 To: "user@cassandra.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (iPad Mail 8J3) X-Mailer: iPad Mail (8J3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-2--579319862 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Have you verified that data you expect to see is not in the server after shu= tdown? WRT the differed in the difference between the Memtable data size and SSTabl= e live size, don't believe everything you read :) Memtable live size is increased by the serialised byte size of every column i= nserted, and is never decremented. Deletes and overwrites will inflate this v= alue. What was your workload like? As of 0.8 we now have global memory management for cf's that tracks actual J= VM bytes used by a CF.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12/07/2011, at 3:28 PM, Jeffrey Wang wrote: > Hey all, >=20 > =20 >=20 > Recently upgraded to 0.8.1 and noticed what seems to be missing data after= a commitlog replay on a single-node cluster. I start the node, insert a bun= ch of stuff (~600MB), stop it, and restart it. There are log messages pertai= ning to the commitlog replay and no errors, but some of the data is missing.= If I flush before stopping the node, everything is fine, and running cfstat= s in the two cases shows different amounts of data in the SSTables. Moreover= , the amount of data that is missing is nondeterministic. Has anyone run int= o this? Thanks. >=20 > =20 >=20 > Here is the output of a side-by-side diff between cfstats outputs for a si= ngle CF before restarting (left) and after (right). Somehow a 37MB memtable b= ecame a 2.9MB SSTable (note the difference in write count as well)? >=20 > =20 >=20 > Column Family: Blocks Column Fam= ily: Blocks >=20 > SSTable count: 0 | SSTable co= unt: 1 >=20 > Space used (live): 0 | Space used= (live): 2907637 >=20 > Space used (total): 0 | Space used= (total): 2907637 >=20 > Memtable Columns Count: 8198 | Memtable C= olumns Count: 0 >=20 > Memtable Data Size: 37550510 | Memtable D= ata Size: 0 >=20 > Memtable Switch Count: 0 | Memtable S= witch Count: 1 >=20 > Read Count: 0 Read Count= : 0 >=20 > Read Latency: NaN ms. Read Laten= cy: NaN ms. >=20 > Write Count: 8198 | Write Coun= t: 1526 >=20 > Write Latency: 0.018 ms. | Write Late= ncy: 0.011 ms. >=20 > Pending Tasks: 0 Pending Ta= sks: 0 >=20 > Key cache capacity: 200000 Key cache c= apacity: 200000 >=20 > Key cache size: 0 Key cache s= ize: 0 >=20 > Key cache hit rate: NaN Key cache h= it rate: NaN >=20 > Row cache: disabled Row cache:= disabled >=20 > Compacted row minimum size: 0 | Compacted r= ow minimum size: 1110 >=20 > Compacted row maximum size: 0 | Compacted r= ow maximum size: 2299 >=20 > Compacted row mean size: 0 | Compacted r= ow mean size: 1960 >=20 > =20 >=20 > Note that I patched https://issues.apache.org/jira/browse/CASSANDRA-2317 i= n my version, but there are no deletions involved so I don=E2=80=99t think i= t=E2=80=99s relevant unless I messed something up while patching. >=20 > =20 >=20 > -Jeffrey >=20 --Apple-Mail-2--579319862 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Have you verified that data you expect t= o see is not in the server after shutdown?

WRT the d= iffered in the difference between the Memtable data size and SSTable live si= ze, don't believe everything you read :)

Memtable l= ive size is increased by the serialised byte size of every column inserted, a= nd is never decremented. Deletes and overwrites will inflate this value. Wha= t was your workload like?

As of 0.8 we now have glo= bal memory management for cf's that tracks actual JVM bytes used by a CF.&nb= sp;

Cheers

-----------------
= Aaron Morton
Freelance Cassandra Developer
@aaronmorton<= /div>

On 12/07/2011, at 3:28 PM, Jeffrey Wang <jwang@palantir.com> wrote:

<= /div>
<= p class=3D"MsoNormal">Hey all,

&nb= sp;

Recently upgraded to 0.8.1 and noticed w= hat seems to be missing data after a commitlog replay on a single-node clust= er. I start the node, insert a bunch of stuff (~600MB), stop it, and restart= it. There are log messages pertaining to the commitlog replay and no errors= , but some of the data is missing. If I flush before stopping the node, ever= ything is fine, and running cfstats in the two cases shows different amounts= of data in the SSTables. Moreover, the amount of data that is missing is no= ndeterministic. Has anyone run into this? Thanks.

 

Here is the output of= a side-by-side diff between cfstats outputs for a single CF before restarti= ng (left) and after (right). Somehow a 37MB memtable became a 2.9MB SSTable (= note the difference in write count as well)?

 

Column Family: Blocks       &n= bsp;             = ;            &nb= sp;         Column Family: Bloc= ks

SSTable count: 0        &nbs= p;            &n= bsp;        |    &nbs= p;            SSTable= count: 1

Space used (live): 0       = ;            &nb= sp;      |       = ;          Space used (live): 2= 907637

Space used (total): 0       &= nbsp;            &nbs= p;    |        &= nbsp;        Space used (total): 2907637<= o:p>

Memtable Columns Count: 8198       &= nbsp;          |  &nb= sp;            &= nbsp; Memtable Columns Count: 0

= Memtable Data Size: 37550510  = ;            &nb= sp;   |          = ;       Memtable Data Size: 0

Memtable Sw= itch Count: 0          &nb= sp;           |  = ;            &nb= sp;  Memtable Switch Count: 1

Read Count: 0    = ;            &nb= sp;            &= nbsp;            = ;         Read Count: 0

Read La= tency: NaN ms.          &n= bsp;             = ;            &nb= sp;      Read Latency: NaN ms.

Write Coun= t: 8198           &nb= sp;            &= nbsp;    |        &nb= sp;        Write Count: 1526

Write L= atency: 0.018 ms.          = ;            | &= nbsp;            = ;   Write Latency: 0.011 ms.

Pending Tasks: 0  &nbs= p;            &n= bsp;            =             &nbs= p;       Pending Tasks: 0

Key cache capac= ity: 200000           = ;            &nb= sp;            &= nbsp; Key cache capacity: 200000

Key cache size: 0   &nb= sp;            &= nbsp;            &nbs= p;            &n= bsp;    Key cache size: 0

Key cache hit rate: NaN&nbs= p;            &n= bsp;            =             &nbs= p;  Key cache hit rate: NaN

Row cache: disabled   &= nbsp;            = ;            &nb= sp;            &= nbsp;   Row cache: disabled

Compacted row minimum size: 0&nbs= p;            &n= bsp;   |         &nbs= p;       Compacted row minimum size: 1110=

Compacted row maximum size: 0       &nbs= p;         |    =              Com= pacted row maximum size: 2299

Compacted row mean size: 0  &nb= sp;            &= nbsp;    |        &nb= sp;        Compacted row mean size: 1960<= o:p>

 

Note that I patched https://issues.apache.org/jira/browse/CASSANDRA-2317 in my= version, but there are no deletions involved so I don=E2=80=99t think it=E2= =80=99s relevant unless I messed something up while patching.

=

 

-Jeffrey<= o:p>

= --Apple-Mail-2--579319862--