Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6AF26200CDF for ; Thu, 17 Aug 2017 18:37:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 690A616B496; Thu, 17 Aug 2017 16:37:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5F9AB16B490 for ; Thu, 17 Aug 2017 18:37:42 +0200 (CEST) Received: (qmail 99441 invoked by uid 500); 17 Aug 2017 16:37:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99431 invoked by uid 99); 17 Aug 2017 16:37:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Aug 2017 16:37:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id BB8E2C00B6 for ; Thu, 17 Aug 2017 16:37:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.289 X-Spam-Level: X-Spam-Status: No, score=-0.289 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=diginetica-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 8TS7ulXzd8aQ for ; Thu, 17 Aug 2017 16:37:36 +0000 (UTC) Received: from mail-wr0-f172.google.com (mail-wr0-f172.google.com [209.85.128.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 630EE5F5F7 for ; Thu, 17 Aug 2017 16:37:36 +0000 (UTC) Received: by mail-wr0-f172.google.com with SMTP id 5so23896724wrz.5 for ; Thu, 17 Aug 2017 09:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=diginetica-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=l116MrvVuCwXvnSgx3wy38kfp16BZQl19oU4t9qWCqw=; b=jUG+XNgfi0p3bdjDRRdGQBX6EkTVQNi62i6ryO89Y0L4XDLLdIKbdJJmJekiAPa6er lnczKU3oLojiHj1UnPLadHaTC5DrdmOqHS3tPqoy15L9BOuqOr/uqkHV875ww2LwlI18 cF5s6/rUYwzFhp1T+/CwBBwrSQIRiIWo1OC1MsqkP4TpornTcwFsPYAwcrF08UMLwBph jZ9g6RXFvTrlIDMD54yHGUKSeRwzB6sEoiVj/VbwUQBeRn1zrlsSVxhzviu+EA7jcrAy 7WwV4UlTNtX2aMQfE/qdDsG77Biywah7q5J1vMpXXgl1frQEbyx4wkMES/dBrYz1h0EE /tkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=l116MrvVuCwXvnSgx3wy38kfp16BZQl19oU4t9qWCqw=; b=lflbteHwvDOCbtGrTTKuR5rUUlqs96jgskgp/34nXIOPUnOTL3nI73TlqmnJNrX0zr J4T8/kwq4wCmkKIctLyRMwW5yVoYWTNQgHAwCMBBrNbtMtWXqoeB8brUhPfcG8c+KGx7 mjNUWwLLQ3iDXMCgvVZ6j/MTx+bqoH5tXcdYmCM5q7Ef/8/cJCReH20y7hNc62lN74Km 6kjUvRL46z9beTUQ8VsuK19dwSsPNIV6XGL1CkBdK5Ve3cAfDNv18UKWVEljicqlvYbY AGbYBE88Oxd5PMmPtduz/7kxygUsRbwdvo216aA/UdxydeYQGn+DjMrzbAcpWEZXM7Up NSBw== X-Gm-Message-State: AHYfb5jP+sqVWpmOP8BP07tXf4NUYuK98EDqiF2gV1lrIfHR+x+BNo0c lieomfTTECguoxnLAZ5yRE9C+AxvS6uCGVo= X-Received: by 10.28.165.145 with SMTP id o139mr1384221wme.49.1502987855028; Thu, 17 Aug 2017 09:37:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.7.8 with HTTP; Thu, 17 Aug 2017 09:36:54 -0700 (PDT) In-Reply-To: References: From: Alex Kotelnikov Date: Thu, 17 Aug 2017 19:36:54 +0300 Message-ID: Subject: Re: Full table scan with cassandra To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="001a114b4108337bcc0556f5a12d" archived-at: Thu, 17 Aug 2017 16:37:43 -0000 --001a114b4108337bcc0556f5a12d Content-Type: text/plain; charset="UTF-8" Dor, I believe, I tried it in many ways and the result is quite disappointing. I've run my scans on 3 different clusters, one of which was using on VMs and I was able to scale it up and down (3-5-7 VMs, 8 to 24 cores) to see, how this affects the performance. I also generated the flow from spark cluster ranging from 4 to 40 parallel tasks as well as just multi-threaded client. The surprise is that trivial fetch of all records using token ranges takes pretty much the same time in all setups. The only beneficial thing I've learned is that it is much more efficient to create a MATERIALIZED VIEW than to filter (even using secondary index). Say, I have a typical dataset, around 3Gb of data, 1M records. And I have a trivial scan practice: String.format("SELECT token(user_id), user_id, events FROM user_events WHERE token(user_id) >= %d ", start) + (end != null ? String.format(" AND token(user_id) < %d ", end) : "") I split all tokens into start-end ranges (except for last range, which only has start) and query ranges in multiple threads, up to 40. Whole process takes ~40s on 3 VMs cluster 2+2+4 cores, 16Gb RAM each 1 virtual disk. And it takes ~30s on real hardware clusters 8servers*8cores*32Gb. Level of the concurrency does not matter pretty much at all. Util it is too high or too low. Size of tokens range matters, but here I see the rule "make it larger, but avoid cassandra timeouts". I also tried spark connector to validate that my test multithreaded app is not the bottleneck. It is not. I expected some kind of elasticity, I see none. Feels like I do something wrong... On 17 August 2017 at 00:19, Dor Laor wrote: > Hi Alex, > > You probably didn't get the paralelism right. Serial scan has > a paralelism of one. If the paralelism isn't large enough, perf will be > slow. > If paralelism is too large, Cassandra and the disk will trash and have too > many context switches. > > So you need to find your cluster's sweet spot. We documented the procedure > to do it in this blog: http://www.scylladb.com/ > 2017/02/13/efficient-full-table-scans-with-scylla-1-6/ > and the results are here: http://www.scylladb.com/ > 2017/03/28/parallel-efficient-full-table-scan-scylla/ > The algorithm should translate to Cassandra but you'll have to use > different rules of the thumb. > > Best, > Dor > > > On Wed, Aug 16, 2017 at 9:50 AM, Alex Kotelnikov < > alex.kotelnikov@diginetica.com> wrote: > >> Hey, >> >> we are trying Cassandra as an alternative for storage huge stream of data >> coming from our customers. >> >> Storing works quite fine, and I started to validate how retrieval does. >> We have two types of that: fetching specific records and bulk retrieval for >> general analysis. >> Fetching single record works like charm. But it is not so with bulk fetch. >> >> With a moderately small table of ~2 million records, ~10Gb raw data I >> observed very slow operation (using token(partition key) ranges). It takes >> minutes to perform full retrieval. We tried a couple of configurations >> using virtual machines, real hardware and overall looks like it is not >> possible to all table data in a reasonable time (by reasonable I mean that >> since we have 1Gbit network 10Gb can be transferred in a couple of minutes >> from one server to another and when we have 10+ cassandra servers and 10+ >> spark executors total time should be even smaller). >> >> I tried datastax spark connector. Also I wrote a simple test case using >> datastax java driver and see how fetch of 10k records takes ~10s so I >> assume that "sequential" scan will take 200x more time, equals ~30 minutes. >> >> May be we are totally wrong trying to use Cassandra this way? >> >> -- >> >> Best Regards, >> >> >> *Alexander Kotelnikov* >> >> *Team Lead* >> >> DIGINETICA >> Retail Technology Company >> >> m: +7.921.915.06.28 <+7%20921%20915-06-28> >> >> *www.diginetica.com * >> > > -- Best Regards, *Alexander Kotelnikov* *Team Lead* DIGINETICA Retail Technology Company m: +7.921.915.06.28 *www.diginetica.com * --001a114b4108337bcc0556f5a12d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Dor,

I believe, I tried it in many ways= and the result is quite disappointing.
I've run my scans on = 3 different clusters, one of which was using on VMs and I was able to scale= it up and down (3-5-7 VMs, 8 to 24 cores) to see, how this affects the per= formance.

I also generated the flow from spark clu= ster ranging from 4 to 40 parallel tasks as well as just multi-threaded cli= ent.

The surprise is that trivial fetch of all rec= ords using token ranges takes pretty much the same time in all setups.=C2= =A0

The only beneficial thing I've learned is = that it is much more efficient to create a MATERIALIZED VIEW than to filter= (even using secondary index).

Say, I have a typic= al dataset, around 3Gb of data, 1M records. And I have a trivial scan pract= ice:

String.format("SELECT token(user_id= ), user_id, events FROM user_events WHERE token(user_id) >=3D %d ",= start) + (end !=3D null ? String.format(" AND token(user_id) < %d = ", end) : "")

I split all tok= ens into start-end ranges (except for last range, which only has start) and= query ranges in multiple threads, up to 40.

Whole= process takes ~40s on 3 VMs cluster =C2=A02+2+4 cores, 16Gb RAM each 1 vir= tual disk. And it takes ~30s on real hardware clusters 8servers*8cores*32Gb= . Level of the concurrency does not matter pretty much at all. Util it is t= oo high or too low.
Size of tokens range matters, but here I see = the rule "make it larger, but avoid cassandra timeouts".
I also tried spark connector to validate that my test multithreaded app i= s not the bottleneck. It is not.

I expected some k= ind of elasticity, I see none. Feels like I do something wrong...



On 17 August 2017 at 00:19, Dor Laor <dor@scylladb.com> wrote:
Hi Alex,
=
You probably didn't get the paralelism right. Serial sca= n has
a paralelism of one. If the paralelism isn't large enou= gh, perf will be slow.
If paralelism is too large, Cassandra and = the disk will trash and have too
many context switches.=C2=A0

So you need to find your cluster's sweet spot. We= documented the procedure
The algo= rithm should translate to Cassandra but you'll have to use different ru= les of the thumb.

Best,
Dor

On Wed, Aug 16, 2017 at 9:50 AM, Alex Kotelnikov <= alex.ko= telnikov@diginetica.com> wrote:
Hey,=C2=A0

we are trying Cass= andra as an alternative for storage huge stream of data coming from our cus= tomers.

Storing works quite fine, and I started to= validate how retrieval does. We have two types of that: fetching specific = records and bulk retrieval for general analysis.
Fetching single = record works like charm. But it is not so with bulk fetch.

With a moderately small table of ~2 million records, ~10Gb raw dat= a I observed very slow operation (using token(partition key) ranges). It ta= kes minutes to perform full retrieval. We tried a couple of configurations = using virtual machines, real hardware and overall looks like it is not poss= ible to all table data in a reasonable time (by reasonable I mean that sinc= e we have 1Gbit network 10Gb can be transferred in a couple of minutes from= one server to another and when we have 10+ cassandra servers and 10+ spark= executors total time should be even smaller).

I t= ried datastax spark connector. Also I wrote a simple test case using datast= ax java driver and see how fetch of 10k records takes ~10s so I assume that= "sequential" scan will take 200x more time, equals ~30 minutes.<= /div>

May be we are totally wrong trying to use Cassandr= a this way?

--

Best Regards,

Alexander Kotelnikov
Tea= m Lead

DIGINETICA<= /span>

Retail Technology Company
<= br>

m: +7= .921.915.06.28

www.diginetica.com





--
=

Best Regards,

Alexander Kotelnikov
Team Le= ad

DIGINETICA

Retail Technology Company

<= /font>

m: +7.921.91= 5.06.28

www.d= iginetica.com

--001a114b4108337bcc0556f5a12d--