Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA48810F88 for ; Wed, 2 Oct 2013 03:13:37 +0000 (UTC) Received: (qmail 1421 invoked by uid 500); 2 Oct 2013 03:13:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 1410 invoked by uid 500); 2 Oct 2013 03:13:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 1402 invoked by uid 99); 2 Oct 2013 03:13:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 03:13:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.220.44] (HELO mail-pa0-f44.google.com) (209.85.220.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 03:13:25 +0000 Received: by mail-pa0-f44.google.com with SMTP id lf10so419277pab.31 for ; Tue, 01 Oct 2013 20:13:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=N79tvF7UO7ELpNv+1jomq1y6mhbShY80MTM4V+/Q2GY=; b=AklZu0REsJ5waGaIPT1foxdOAuUvEa4xM9eem2hYAIR2eSfA67aXBAyOiiIFyEhff8 +TGrP0ov1UKVCEi1sZ0fBe9p+vA/r26ffZZ/pHWQgeUX48ZhwYgZyH+8T6OFC8jwj4s3 wAaoYTRqvNrY/wpr0ws/O4uvkRIQgN5UEE/btSEc+uciYpMdigm28i8o9c813mmz6Cki H5kOhb+13fFjo59m2Z9wjYeoUyXQSX/Jl1LCYQmu2v+eEz62Wnt1/hQYhKW8HSYRbNTD wMEFTByM82N2weEq6ldRsG9hfcgXR/M+m7faylgF/bV/DRQA0OHEi/vGseA0Fk5rDk1E zCZg== X-Gm-Message-State: ALoCoQnXnNxA6xj0ufdz4isYcwZIXJ4a5oMVHlDJcBWivdP290FYTbMsgRPEcQIP6UUon16SYXUB X-Received: by 10.66.158.196 with SMTP id ww4mr697465pab.57.1380683584425; Tue, 01 Oct 2013 20:13:04 -0700 (PDT) Received: from [172.16.1.18] ([203.86.207.101]) by mx.google.com with ESMTPSA id tz3sm9906438pbc.20.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Oct 2013 20:13:03 -0700 (PDT) From: Aaron Morton Content-Type: multipart/alternative; boundary="Apple-Mail=_D9B7D721-B20C-4D46-AF96-B155558CB802" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: 2.0.1 counter replicate on write error Date: Wed, 2 Oct 2013 16:12:59 +1300 References: <01f201cebb6f$5a5050a0$0ef0f1e0$@struq.com> <74827C78-CEC1-4FD9-91B9-55322D15A6E0@thelastpickle.com> <011c01cebdc1$7bf848d0$73e8da70$@struq.com> To: user@cassandra.apache.org In-Reply-To: <011c01cebdc1$7bf848d0$73e8da70$@struq.com> X-Mailer: Apple Mail (2.1510) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_D9B7D721-B20C-4D46-AF96-B155558CB802 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > Thanks Aaron, I=92ve added to the ticket. We were not running on TRACE = logging. Thanks.=20 The only work around I can think of is using nodetool scrub. That will = read the -Data.db file and re-write it and the other components.=20 Remember to snapshot first for roll back.=20 Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 30/09/2013, at 10:43 PM, Christopher Wirt = wrote: > Thanks Aaron, I=92ve added to the ticket. We were not running on TRACE = logging. > =20 > From: Aaron Morton [mailto:aaron@thelastpickle.com]=20 > Sent: 30 September 2013 08:37 > To: user@cassandra.apache.org > Subject: Re: 2.0.1 counter replicate on write error > =20 > ERROR [ReplicateOnWriteStage:19] 2013-09-27 10:17:14,778 = CassandraDaemon.java (line 185) Exception in thread = Thread[ReplicateOnWriteStage:19,5,main] > java.lang.AssertionError: DecoratedKey(-1754949563326053382, = a414b0c07f0547f8a75410555716ced6) !=3D = DecoratedKey(-1754949563326053382, aeadcec8184445d4ab631ef4250927d0) in = /disk3/cassandra/data/struqrealtime/counters/struqrealtime-counters-jb-831= 953-Data.db > at = org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNa= mesIterator.java:114) > at = org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTable= NamesIterator.java:62) > =20 > When reading from an SSTable the position returned from the -Index.db = / KEYS cache pointed to a row in the -Data.db component that was for a = different row.=20 > =20 > DecoratedKey(-1754949563326053382, aeadcec8184445d4ab631ef4250927d0) > Was what we were searching for > =20 > DecoratedKey(-1754949563326053382, a414b0c07f0547f8a75410555716ced6) > Is what was found in the data component.=20 > =20 > The first part is the Token (M3 hash) the second is the key. It looks = like a collision, but it could also be a bug somewhere else.=20 > =20 > Code in SSTableReader.getPosition() points to = https://issues.apache.org/jira/browse/CASSANDRA-4687 and adds an = assertion that is only trigger if TRACE logging is running. Can you add = to the 4687 ticket and update the thread ?=20 > =20 > Cheers > =20 > ----------------- > Aaron Morton > New Zealand > @aaronmorton > =20 > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > =20 > On 27/09/2013, at 10:50 PM, Christopher Wirt = wrote: >=20 >=20 > Hello, > =20 > I=92ve started to see a slightly worrying error appear in our logs = occasionally. We=92re writing at 400qps per machine and I only see this = appear every 5-10minutes. > =20 > Seems to have started when I switched us to using the hsha thrift = server this morning. We=92ve been running 2.0.1 ran off the sync thrift = server since yesterday without seeing this error. But might not be = related. > =20 > There are some machines in another DC still running 1.2.10. > =20 > Anyone seen this before? Have any insight? > =20 > ERROR [ReplicateOnWriteStage:19] 2013-09-27 10:17:14,778 = CassandraDaemon.java (line 185) Exception in thread = Thread[ReplicateOnWriteStage:19,5,main] > java.lang.AssertionError: DecoratedKey(-1754949563326053382, = a414b0c07f0547f8a75410555716ced6) !=3D = DecoratedKey(-1754949563326053382, aeadcec8184445d4ab631ef4250927d0) in = /disk3/cassandra/data/struqrealtime/counters/struqrealtime-counters-jb-831= 953-Data.db > at = org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNa= mesIterator.java:114) > at = org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTable= NamesIterator.java:62) > at = org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(N= amesQueryFilter.java:87) > at = org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryF= ilter.java:62) > at = org.apache.cassandra.db.CollationController.collectAllData(CollationContro= ller.java:249) > at = org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationCo= ntroller.java:53) > at = org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyS= tore.java:1468) > at = org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStor= e.java:1294) > at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:332) > at = org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCom= mand.java:55) > at = org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMut= ation.java:100) > at = org.apache.cassandra.service.StorageProxy$8$1.runMayThrow(StorageProxy.jav= a:1107) > at = org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StoragePro= xy.java:1897) > at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :615) > at java.lang.Thread.run(Thread.java:724) --Apple-Mail=_D9B7D721-B20C-4D46-AF96-B155558CB802 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
Thanks = Aaron, I=92ve added to the ticket. We were not running on TRACE = logging.

The only work around I can think of is using = nodetool scrub. That will read the -Data.db file and re-write it and the = other components. 

Remember to snapshot first for roll = back. 


Cheers

http://www.thelastpickle.com

On 30/09/2013, at 10:43 PM, Christopher Wirt <chris.wirt@struq.com> = wrote:

Thanks = Aaron, I=92ve added to the ticket. We were not running on TRACE = logging.
 
From: Aaron Morton = [mailto:aaron@thelastpickle.com] 
Sent: 30 September 2013 = 08:37
To: user@cassandra.apache.orgSubject: Re: = 2.0.1 counter replicate on write = error
ERROR = [ReplicateOnWriteStage:19] 2013-09-27 10:17:14,778 CassandraDaemon.java = (line 185) Exception in thread = Thread[ReplicateOnWriteStage:19,5,main]
=
java.lang.AssertionError: = DecoratedKey(-1754949563326053382, a414b0c07f0547f8a75410555716ced6) !=3D = DecoratedKey(-1754949563326053382, aeadcec8184445d4ab631ef4250927d0) in = /disk3/cassandra/data/struqrealtime/counters/struqrealtime-counters-jb-831= 953-Data.db
        at = org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(S= STableNamesIterator.java:62)
 
When reading from an SSTable the position returned = from the -Index.db / KEYS cache pointed to a row in the -Data.db = component that was for a different row. 
 
DecoratedKey(-1754949563326053382, = aeadcec8184445d4ab631ef4250927d0)
Was what we were searching = for
Is what was found in the data = component. 
The = first part is the Token (M3 hash) the second is the key. It looks like a = collision, but it could also be a bug somewhere = else. 
Code = in SSTableReader.getPosition() points to Aaron = Morton
Seems to have started when I switched us to using = the hsha thrift server this morning. We=92ve been running 2.0.1 ran off = the sync thrift server since yesterday without seeing this error. =  But might not be related.
 
There are some machines in another DC still = running 1.2.10.
 
Anyone seen this before? Have any = insight?
java.lang.AssertionError: = DecoratedKey(-1754949563326053382, a414b0c07f0547f8a75410555716ced6) !=3D = DecoratedKey(-1754949563326053382, aeadcec8184445d4ab631ef4250927d0) in = /disk3/cassandra/data/struqrealtime/counters/struqrealtime-counters-jb-831= 953-Data.db
        at = org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(S= STableNamesIterator.java:62)
        at = org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(N= amesQueryFilter.java:87)
        at = org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryF= ilter.java:62)
        at = org.apache.cassandra.db.CollationController.collectAllData(CollationContro= ller.java:249)
        at = org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationCo= ntroller.java:53)
        at = org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyS= tore.java:1468)
        at = org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStor= e.java:1294)
        at = org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMut= ation.java:100)
        at = org.apache.cassandra.service.StorageProxy$8$1.runMayThrow(StorageProxy.jav= a:1107)
        at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145)