Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 83E5C2009FB for ; Fri, 6 May 2016 18:50:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 828D2160A0C; Fri, 6 May 2016 16:50:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A55B31608F8 for ; Fri, 6 May 2016 18:50:07 +0200 (CEST) Received: (qmail 13444 invoked by uid 500); 6 May 2016 16:50:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13434 invoked by uid 99); 6 May 2016 16:50:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2016 16:50:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D1D651802E7 for ; Fri, 6 May 2016 16:50:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.88 X-Spam-Level: * X-Spam-Status: No, score=1.88 tagged_above=-999 required=6.31 tests=[AC_DIV_BONANZA=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id r2mjQffn-gHZ for ; Fri, 6 May 2016 16:50:04 +0000 (UTC) Received: from mail-yw0-f179.google.com (mail-yw0-f179.google.com [209.85.161.179]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 662D45F56C for ; Fri, 6 May 2016 16:50:03 +0000 (UTC) Received: by mail-yw0-f179.google.com with SMTP id t10so220601612ywa.0 for ; Fri, 06 May 2016 09:50:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=xVK5RGxf8yunXHGeztg6aQi/dw6+Pg1oNLbnzu37zlc=; b=lPiTImqyJvhtULIKqY+MdfpdviSD5wrhjwT0W6gJBrmXzs50wyhuUMDiUXvVQURM3F wZtWlgj1JDBTXEgtkypLf6nhevTWYA8xnbuHWhWh2XoqoVLQEaeO+b0Bi+M/ndJCIhuT U8pmlx7+qQQiGAJQfZk4JqVDW8ze9RQYB5GSMld9NL3ShHsOih1DB21xV9umrXstShGK kPZFdlcPPZjqmQw9vzL8awFNwitpMqBz0Jct4Sb/DK0XW0PO1O+fiqSuzHEf2hDbJuqx zK78ZpRsYer9C5ahAmj2lZpWb0DCHdJlu6YtrZZYMtoDAirw3pZ6n7oa6lVW547PcyQX AOTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=xVK5RGxf8yunXHGeztg6aQi/dw6+Pg1oNLbnzu37zlc=; b=hR4m+ujtE4JLwWTbGXUW854rTcBmnGm5VFNLnDi37BUnUEld+PzEEALjUGXDe5+Z3M DSQPjbPJ4rNJpHNBQxqqVkzuiVHa3RuN2XipbhEhvesa320dXeQHQk2LCCNNyn8LhLij 39E9bFtxkcV+lvwgPWaOoTPcpyfxmVMzuOsoX1uX28cQjWqj/XQ6C68IgbdvT+XEiCSy 7BgrV5qwT2A7B2gT++lX0qT0xTxKbFh1fOb3U1NI4GfDZXwhs/EWrZu1mjEzcvz8xjIC ES+UBrhZfV43bu19vg8doGcMuKJ1X1N9tIp3Guz8tVVWu+ERjG76xyn3G411tyVZmj/O edvA== X-Gm-Message-State: AOPr4FXN3ZrCzox6OP6vZk7lSOpRueIviywWGzOBh3ARS1jin03owOMzeTHOqvnr4K2QFtbL0qRDe5RSdfmxZw== MIME-Version: 1.0 X-Received: by 10.129.137.129 with SMTP id z123mr14422849ywf.101.1462553402577; Fri, 06 May 2016 09:50:02 -0700 (PDT) Received: by 10.129.160.2 with HTTP; Fri, 6 May 2016 09:50:02 -0700 (PDT) Received: by 10.129.160.2 with HTTP; Fri, 6 May 2016 09:50:02 -0700 (PDT) In-Reply-To: References: <32465DF4-CC80-455B-9CF0-12179731459E@crowdstrike.com> Date: Fri, 6 May 2016 22:20:02 +0530 Message-ID: Subject: Re: Read data from specific node in cassandra From: Joseph Tech To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=94eb2c06bf3806ba0305322f40f4 archived-at: Fri, 06 May 2016 16:50:08 -0000 --94eb2c06bf3806ba0305322f40f4 Content-Type: text/plain; charset=UTF-8 Please check if nodetool getendpoints be used, if you know the key (going by your problem description) On 6 May 2016 22:04, "Siddharth Verma" wrote: @Joseph, An incident we saw in production, and have a speculation as to how it might have occured. *A detailed description of use case* *Incident* We have a 2 DCs each with three nodes. And our keyspace has RF 3 per DC. read_repair_chance is 0.0 for all the tables. After a while(we run periodic full table scans to dump data someplace else), we saw corrupted data being dumped. We copied the ss tables of all node of one DC to a separate cluster created for debugging. We shutdown two nodes of the replica cluster, so that only one was up, and made queries on cqlsh for the possibly corrupted data. What we saw was. out of the three nodes of replica, two has similar data, and one had some extra data which shouldn't have been there for that particular partition key. *Speculation* A possible cause we could come up with was, on a particular day, one of the nodes of the production DC might have gone down. And that time might have crossed the hinted_handoff_window. Say, node went down on 12PM Coordinator nodes stored hints from 12PM - 3PM. Node was started on 6PM All deletions/updates 3PM-6PM were not on our particular node. And repair wasn't run on that node. After 10 days, tombstones deleted(gc_grace_seconds). Now that particular node still has data which was missed in deletion, and the data has been removed from other two nodes. So, we can't run repair now. Again, it is a possible speculation. We are not sure. This is the only cause we could come up with @User Back to the requirement "*Read data from specific node in cassandra*" I prematurely stated whitelist worked *perfectly. *However, while scanning the data, it isn't the case. It has caused ambiguous data dump. This option didn't work for debugging. Could someone suggest other alternatives? --94eb2c06bf3806ba0305322f40f4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Please check if nodetool getendpoints be used, if you know t= he key (going by your problem=C2=A0 description)

On 6 May 2016 22:04, "Siddharth Verma"= <verma.siddharth@snapde= al.com> wrote:
@Joseph,
An incident we saw = in production, and have a speculation as to how it might have occured.
<= br>
A detailed description of use case
Inciden= t
We have a 2 DCs each with three nodes.
And our= keyspace has RF 3 per DC. read_repair_chance is 0.0 for all the tables.
After a while(we run periodic full table scans to dump data someplac= e else), we saw corrupted data being dumped.
We copied the ss tab= les of all node of one DC to a separate cluster created for debugging.
<= /div>=C2=A0=C2=A0=C2=A0=C2=A0 We shutdown two nodes of the replica cluster, so that only one was up,=20 and made queries on cqlsh for the possibly corrupted data.
=C2=A0= =C2=A0=C2=A0=C2=A0 What we saw was. out of the three nodes of replica, two has similar=20 data, and one had some extra data which shouldn't have been there for= =20 that particular partition key.


Speculation
A possible cause we could come up with was, on a particular day, one of=20 the nodes of the production DC might have gone down. And that time might have crossed the hinted_handoff_window.
Say, node went down on 12= PM
Coordinator nodes stored hints from 12PM - 3PM.
Node w= as started on 6PM
All deletions/updates 3PM-6PM were not on our pa= rticular node.
And repair wasn't run on that node. After 10 da= ys, tombstones deleted(gc_grace_seconds).
Now that particular node= still has data which was missed in deletion, and the data has been removed= from other two nodes.
So, we can't run repair now.

Again, it is a possible speculation. We are not sure. This is the only c= ause we could come up with


@User
Back to the= requirement "Read data from specific node in cassandra" <= br>I prematurely stated whitelist worked perfectly. However, while s= canning the data, it isn't the case. It has caused ambiguous data dump.=
This option didn't work for debugging.
Could someo= ne suggest other alternatives?
--94eb2c06bf3806ba0305322f40f4--