From user-return-63025-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Jan 17 21:15:27 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id AFEF218063F for ; Thu, 17 Jan 2019 21:15:26 +0100 (CET) Received: (qmail 33554 invoked by uid 500); 17 Jan 2019 20:15:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33544 invoked by uid 99); 17 Jan 2019 20:15:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2019 20:15:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6A409C1DCD for ; Thu, 17 Jan 2019 20:15:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.807 X-Spam-Level: * X-Spam-Status: No, score=1.807 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id u59UmHgttKmP for ; Thu, 17 Jan 2019 20:15:22 +0000 (UTC) Received: from mail-it1-f170.google.com (mail-it1-f170.google.com [209.85.166.170]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6404F5F52B for ; Thu, 17 Jan 2019 20:15:22 +0000 (UTC) Received: by mail-it1-f170.google.com with SMTP id m62so3436149ith.5 for ; Thu, 17 Jan 2019 12:15:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=e5g5o9SiJ8ff27o73eI/H4k2rRsxJH37UkSSZn4JmbY=; b=vWOPUoxz83pcJfPcEfukD1Vx4QN5Pa9ctY234hcrmPRTmeQZFawGrjNKiFphQ8oqUg DESLZKg//nLl5yMMm6qqCZrgrY5ULvqIikLwN0nVEB8B3a4iCNIryN9bUgNLbMjpjsz6 +praF+E+YLkFe8KxVTfsCnRvXVVxDEfpmgfxG39vyFqR+o/SCCeiKgpfijBGwiSh4z+y wyZ3gF7CS/aOqmRn0hSh5IPGBmbiF8/mAMp4zslg863c03fYSLNu+w/y0zCoGRnuzXof UlaX5MonCVSYuzka/LXo4kTYr6XZI0juydW6tHbLNDyjk6T7zusHlMNB70l0d1GZT33+ hyNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=e5g5o9SiJ8ff27o73eI/H4k2rRsxJH37UkSSZn4JmbY=; b=GintY6LFh+6yaew1L5jOosuMuBULk8fjboHEHO6Y88BMceVMeXDOFkVOESbOaWHQM3 BrSXwSETNP33i9qdDBdis2BgNcGbwN6SPpaf7pcsmGHLkCxlPwXRTPcaWs6Nxwv2osf2 iqF9lGKrqFWVFrx53BSMouVVwJ8LkeFvFZXNshK1RjbbktlSCegvlE6TPkf1mLtH/kyX XrG/tDkuIU+60nlzTpvWio52hRNqOcT/3O2Y3XrfDYkkBokZ4jBSMteetxsxhTjdKiki phJbGXNL/vBfH2xlsw5lxBEVWEUflMf2zQq4UY7g7k8EyeC1Dp7WmSzotm4c1IMhwAq5 HGDQ== X-Gm-Message-State: AJcUukcLrNmA2D7tCjPyhbdzdjTUSLYklFYtITtm0H3uT981qynLOUG6 e4kaRVF09vWBC4KE/yT/k9lL3hZX47gCzJDHdE491Q== X-Google-Smtp-Source: ALg8bN6FKAjpXQenKPU6xaNPQ+MNemVnlhY4wUS24OJDjv0GsG94/JIQFCpWWhL9aVvjYoVLnou1HOwYx481WD3lyn8= X-Received: by 2002:a24:c40b:: with SMTP id v11mr8763186itf.73.1547756115512; Thu, 17 Jan 2019 12:15:15 -0800 (PST) MIME-Version: 1.0 From: Goutham reddy Date: Thu, 17 Jan 2019 12:15:02 -0800 Message-ID: Subject: Partition key with 300K rows can it be queried and distributed using Spark To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="00000000000076d43a057fad0d9d" --00000000000076d43a057fad0d9d Content-Type: text/plain; charset="UTF-8" Hi, As each partition key can hold up to 2 Billion rows, even then it is an anti-pattern to have such huge data set for one partition key in our case it is 300k rows only, but when trying to query for one particular key we are getting timeout exception. If I use Spark to get the 300k rows for a particular key does it solve the problem of timeouts and distribute the data across the spark nodes or will it still throw timeout exceptions. Can you please help me with the best practice to retrieve the data for the key with 300k rows. Any help is highly appreciated. Regards Goutham. --00000000000076d43a057fad0d9d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
= As each partition key can hold up to 2 Billion rows, even then it is an ant= i-pattern to have such huge data set for one partition key in our case it i= s 300k rows only, but when trying to query for one particular key we are ge= tting timeout exception. If I use Spark to get the 300k rows for a particul= ar key does it solve the problem of timeouts and distribute the data across= the spark nodes or will it still throw timeout exceptions. Can you please = help me with the best practice to retrieve the data for the key with 300k r= ows. Any help is highly appreciated.

Regards
Go= utham.
--00000000000076d43a057fad0d9d--