From user-return-64528-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Tue Oct 1 21:49:08 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8F7D6180608 for ; Tue, 1 Oct 2019 23:49:08 +0200 (CEST) Received: (qmail 61564 invoked by uid 500); 1 Oct 2019 21:49:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61432 invoked by uid 99); 1 Oct 2019 21:49:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2019 21:49:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C1DCFC26B7 for ; Tue, 1 Oct 2019 21:49:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.8 X-Spam-Level: X-Spam-Status: No, score=0.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 60lrSW2JBBOY for ; Tue, 1 Oct 2019 21:49:00 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::233; helo=mail-lj1-x233.google.com; envelope-from=doanduyhai@gmail.com; receiver= Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 33D627DE28 for ; Tue, 1 Oct 2019 21:49:00 +0000 (UTC) Received: by mail-lj1-x233.google.com with SMTP id n14so14951372ljj.10 for ; Tue, 01 Oct 2019 14:49:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=/XfrfX7l7hcZimbtMq27aY7aVaN8Zrw9ZbuAqd33IXQ=; b=dYON63mBK1gE1o3D6f+n/8AyfxmWDRyI4+HXshggYKcQ/KlVz4ywJd0m0np8EEzkel wYoO+9SJYQ6nk0GkjzQKPp/JxNdrqZQtsZr2BQ4GXq76RLwhpwo6jy0AylM3tHZDYfGt d6jeBo9dREw8FgU/FHQG4bpWwv2xc4C6twCulG2D2Hiry2WhxTwYlGlImWKW0Ym2SL3L DR1fFIknWj2gyzWfKS1fRljMd69LuAteawW9aavh9a8dxSe4D2S7RqOPIljXKAt6Wyfu /vZGJtYsuNVzNTAlDGWt1Ox+N5jOfQWkoBKgcAUwW3wRSOrh99iEiItfdq4nYDqVjphp fk1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=/XfrfX7l7hcZimbtMq27aY7aVaN8Zrw9ZbuAqd33IXQ=; b=I4cZi2M3B2CyeieaB6qN9FD/Ed0ZRPfDrkXd/xBU9fefapwwdP+eQjnLHuld0M9X9a mXbwzdMHVa5RxbSiQNaoXA/uCyD6nQi/mJUM3DeBw32n0aXJ75kb9xlY3VMFumakckYl De21TtmqkgaafqPAy1vgQ3EF0MfJeuTlXMpvPStW25jP8M/ptbcmfLCUL23Zd+lxWpoF 7M+xi2dqQ/1PsCdsH0mdwYF3GuFnNn96n6hvr9MpHXHnCtWYfk74aU9bZYgogJbe8tl1 lCwwB+OGmSdFEtW+G1Pyq+XRuG7NhtHMzBhuHtYg0Minb3usa44Or7L9bST3jzePaozV FJeQ== X-Gm-Message-State: APjAAAVOzgpT7AU3GZ3h9vCRGNHBYx0SLVvjgvzMpEAJ32mhgse5Hmqn 0VTg0iSu8MGBZ3r9KrLwu2QDlQ/96JtiHlwXWryuxerw X-Google-Smtp-Source: APXvYqyCIksK094wc7rEk2eoAZlQ9rxAtpfhfWh3/T0OGhwjZCtHUfmqtyU81TQX/rxyCDMUrBDGHYCHap+LbijZyH0= X-Received: by 2002:a2e:2bda:: with SMTP id r87mr45956ljr.3.1569966539355; Tue, 01 Oct 2019 14:48:59 -0700 (PDT) MIME-Version: 1.0 References: <0D44F373-EF10-40FC-A79E-AE5C4236CD9B@gmail.com> In-Reply-To: From: DuyHai Doan Date: Tue, 1 Oct 2019 23:48:48 +0200 Message-ID: Subject: Re: Cluster sizing for huge dataset To: user Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The client wants to be able to access cold data (2 years old) in the same cluster so moving data to another system is not possible However, since we're using Datastax Enterprise, we can leverage Tiered Storage and store old data on Spinning Disks to save on hardware Regards On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau wrote: > > Hi, > Depending on the use case, you may also consider storage tiering with fre= sh data on hot-tier (Cassandra) and older data on cold-tier (Spark/Parquet = or Presto/Parquet). It would be a lot more complex, but may fit more approp= riately the budget and you may reuse some tech already present in your envi= ronment. > You may even do subsampling during the transformation offloading data fro= m Cassandra in order to keep one point out of 10 for older data if subsampl= ing makes sense for your data signal. > > Regards > Julien > > Le lun. 30 sept. 2019 =C3=A0 22:03, DuyHai Doan a = =C3=A9crit : >> >> Thanks all for your reply >> >> The target deployment is on Azure so with the Nice disk snapshot feature= , replacing a dead node is easier, no streaming from Cassandra >> >> About compaction overhead, using TwCs with a 1 day bucket and removing r= ead repair and subrange repair should be sufficient >> >> Now the only remaining issue is Quorum read which triggers repair automa= gically >> >> Before 4.0 there is no flag to turn it off unfortunately >> >> Le 30 sept. 2019 15:47, "Eric Evans" a =C3= =A9crit : >> >> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa wrote: >> >> [ ... ] >> >> > 2) The 2TB guidance is old and irrelevant for most people, what you re= ally care about is how fast you can replace the failed machine >> > >> > You=E2=80=99d likely be ok going significantly larger than that if you= use a few vnodes, since that=E2=80=99ll help rebuild faster (you=E2=80=99l= l stream from more sources on rebuild) >> > >> > If you don=E2=80=99t want to use vnodes, buy big machines and run mult= iple Cassandra instances in it - it=E2=80=99s not hard to run 3-4TB per ins= tance and 12-16T of SSD per machine >> >> We do this too. It's worth keeping in mind though that you'll still >> have a 12-16T blast radius in the event of a host failure. As the >> host density goes up, consider steps to make the host more robust >> (RAID, redundant power supplies, etc). >> >> -- >> Eric Evans >> john.eric.evans@gmail.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org >> For additional commands, e-mail: user-help@cassandra.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org For additional commands, e-mail: user-help@cassandra.apache.org