Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 279E610949 for ; Wed, 11 Feb 2015 10:49:45 +0000 (UTC) Received: (qmail 99482 invoked by uid 500); 11 Feb 2015 10:49:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99429 invoked by uid 500); 11 Feb 2015 10:49:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99419 invoked by uid 99); 11 Feb 2015 10:49:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Feb 2015 10:49:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rolo@pythian.com designates 209.85.213.175 as permitted sender) Received: from [209.85.213.175] (HELO mail-ig0-f175.google.com) (209.85.213.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Feb 2015 10:49:38 +0000 Received: by mail-ig0-f175.google.com with SMTP id hn18so29778107igb.2 for ; Wed, 11 Feb 2015 02:49:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=mlCjEYgA0SnwP97kX5fSLqIfFKyYldtbSPRooAyTOWk=; b=Rl/zuW2MQB7eDeoUhmwrnh8bHlOaJXsgWiJwITThaf7BbsNPleHumcFL3tVKeC54ZE ELpIt+SRla4nKl17GBOQvPhz9jM6cv8ZkjypuVLnKKfbEq/5APDDshqNl/yo7oSfwoCA lE+7Pp++WCAtRx+wrw8l3fTZxHWz92D4c6DmkXYF9N8z5ZF0m3qlcKjidiMq/3R/Wd2w l+zpqr2rawbEcU1l5sA6J0/6W3b4dBU1alVUaRFfndLUAwbicPWcP8/MeTbI6BbMwRXF ml6Hz2KHKGTk62mi1rbePdv3fFAn57w1+eotxOmTbqRfE3Mqc+2zeHEoqWWQvBXFXQ3K Ff/w== X-Gm-Message-State: ALoCoQkjayHXm8DPWvDxWD98Jh5kVnJWPzvKIM1fSAj5iXa1oIMKMAaKODA0JGboYiuaxXV+CryRPecqc0/lDCpRZ8qH/2FGjotisWyUMa7VsDk4rahQ7M0SPqEQiR7K8PBlxL75AwfZ X-Received: by 10.43.70.135 with SMTP id yg7mr2224594icb.41.1423651753501; Wed, 11 Feb 2015 02:49:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.118.69 with HTTP; Wed, 11 Feb 2015 02:48:53 -0800 (PST) In-Reply-To: <288863E0-3555-4F9A-8008-F40AE70CEBE9@gmail.com> References: <288863E0-3555-4F9A-8008-F40AE70CEBE9@gmail.com> From: Carlos Rolo Date: Wed, 11 Feb 2015 11:48:53 +0100 Message-ID: Subject: Re: Two problems with Cassandra To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec51b1f2d0d3d83050ecdc1d2 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec51b1f2d0d3d83050ecdc1d2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Pavel, What is the size of the Cluster (# of nodes)? And you need to iterate over the full 1TB every time you do the update? Or just parts of it? IMO information is short to make any kind of assessment of the problem you are having. I can suggest to try a 2.0.x (or 2.1.1) release to see if you get the same problem. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzartero= lo * Tel: 1649 www.pythian.com On Wed, Feb 11, 2015 at 11:22 AM, Pavel Velikhov wrote: > Hi, > > I=E2=80=99m using Cassandra to store NLP data, the dataset is not that = huge > (about 1TB), but I need to iterate over it quite frequently, updating the > full dataset (each record, but not necessarily each column). > > I=E2=80=99ve run into two problems (I=E2=80=99m using the latest Cassan= dra): > > 1. I was trying to copy from one Cassandra cluster to another via a > python driver, however the driver confused the two instances > 2. While trying to update the full dataset with a simple transformation > (again via python driver), single node and clustered Cassandra run out of > memory no matter what settings I try, even I put a lot of sleeps into the > mix. However simpler transformations (updating just one column, specially > when there is a lot of processing overhead) work just fine. > > I=E2=80=99m really concerned about #2, since we=E2=80=99re moving all hea= vy processing to > a Spark cluster and will expand it, and I would expect much heavier traff= ic > to/from Cassandra. Any hints, war stories, etc. very appreciated! > > Thank you, > Pavel Velikhov --=20 -- --bcaec51b1f2d0d3d83050ecdc1d2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello Pavel,

What is the size = of the Cluster (# of nodes)? And you need to iterate over the full 1TB ever= y time you do the update? Or just parts of it?

IMO information= is short to make any kind of assessment of the problem you are having.
=
I can suggest to try a 2.0.x (or 2.1.1) release to see if you get= the same problem.

<= div>
Regards,
=

Carlos Juzarte Rolo
Cassandra Consultant
=C2=A0
Pythian - Love your data

ro= lo@pythian | Twitter: cjrolo | Linkedin: linkedin.co= m/in/carlosjuzarterolo
Tel:=C2=A01649
www.pythian.com

On Wed, Feb 11, 2015 at 11:22 AM, Pavel Veli= khov <pavel.velikhov@gmail.com> wrote:
Hi,

=C2=A0 I=E2=80=99m using Cassandra to store NLP data, the dataset is not th= at huge (about 1TB), but I need to iterate over it quite frequently, updati= ng the full dataset (each record, but not necessarily each column).

=C2=A0 I=E2=80=99ve run into two problems (I=E2=80=99m using the latest Cas= sandra):

=C2=A0 1. I was trying to copy from one Cassandra cluster to another via a = python driver, however the driver confused the two instances
=C2=A0 2. While trying to update the full dataset with a simple transformat= ion (again via python driver), single node and clustered Cassandra run out = of memory no matter what settings I try, even I put a lot of sleeps into th= e mix. However simpler transformations (updating just one column, specially= when there is a lot of processing overhead) work just fine.

I=E2=80=99m really concerned about #2, since we=E2=80=99re moving all heavy= processing to a Spark cluster and will expand it, and I would expect much = heavier traffic to/from Cassandra. Any hints, war stories, etc. very apprec= iated!

Thank you,
Pavel Velikhov


--



--bcaec51b1f2d0d3d83050ecdc1d2--