Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F4099C63 for ; Tue, 17 Apr 2012 11:26:33 +0000 (UTC) Received: (qmail 66487 invoked by uid 500); 17 Apr 2012 11:26:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 66458 invoked by uid 500); 17 Apr 2012 11:26:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 66450 invoked by uid 99); 17 Apr 2012 11:26:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Apr 2012 11:26:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of patrik.modesto@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-lpp01m010-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Apr 2012 11:26:24 +0000 Received: by lagj5 with SMTP id j5so5079811lag.31 for ; Tue, 17 Apr 2012 04:26:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=xEddmP4lQz9pUQvONCDuxPDrnaR5op1ftdosg5+6CpU=; b=e3RUN5fdqQ6c0o1UK+p12Sd6der/iNOx40iD+4KhisEtHq3qf0c7xTgFhvAFGByCHZ 18PCVIQY/wE1zlH960E4XR20b+ILk9mFOOHeJl17yRaVzHmPaaVSh5YMXHIcXVZOIGwY sBHlR/aIyBaNLvlBTr2ykSFvrWR+HyJ0vLKOQTW5i3e+XDJhCK2nHmfebTqwdG5QiLVb vJnfotu4sLKu547TvqGVuiXyTtTcYHc5VYs2r/vah4d+VKqcj14HBmZl6y+zjTZb+BCB awT9KyZmmkG3vadRaV1hUU8tRZMeRvZwyURzLL6vMXuYUJxyAc7mhgpvtgZzfaoWDQZz dgVA== Received: by 10.152.146.67 with SMTP id ta3mr13642249lab.25.1334661963349; Tue, 17 Apr 2012 04:26:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.152.23.198 with HTTP; Tue, 17 Apr 2012 04:25:33 -0700 (PDT) In-Reply-To: <35D93A89-556F-4A13-84FA-6E0E8599E5F0@thelastpickle.com> References: <35D93A89-556F-4A13-84FA-6E0E8599E5F0@thelastpickle.com> From: Patrik Modesto Date: Tue, 17 Apr 2012 13:25:33 +0200 Message-ID: Subject: Re: Poor write performance with seconrady index To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Aaron, thanks for the reply. I suspected it might be the read-and-write that causes the slower updates. Regards, P. On Tue, Apr 17, 2012 at 11:52, aaron morton wrote= : > Secondary indexes require a read and a write (potentially two) for every > update. Regular mutations are no look writes and are much faster. > > Just like in a RDBMS, it's more efficient to insert data and then create = the > index than to insert data with the index present. > > An alternative is to create SSTables in the hadoop jobs and bulk load the= m > into the cluster. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/04/2012, at 2:51 AM, Patrik Modesto wrote: > > Hi, > > I've a 4 node test cluster running Cassandra 1.0.9, 32GB memory, 4x > 1TB disks. I've two keyspaces, rfTest2 (RF=3D2) and rfTest3 (RF=3D3). > There are two CF, one with source data and one with secondary index: > > create column family UrlGroup > =C2=A0=C2=A0=C2=A0with column_type=3DStandard > =C2=A0=C2=A0=C2=A0and comparator=3DUTF8Type > =C2=A0=C2=A0=C2=A0and default_validation_class=3DUTF8Type > =C2=A0=C2=A0=C2=A0and key_validation_class=3DUTF8Type > =C2=A0=C2=A0=C2=A0and column_metadata=3D > =C2=A0=C2=A0=C2=A0[{ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0column_name: groupId, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0validation_class: UTF8Type, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0index_type: KEYS > =C2=A0=C2=A0=C2=A0}]; > > I'm running Hadoop mapreduce job, reading the source CF and creating 3 > mutations for each row-key in the UrlGroup CF. > > The mapreduce runs for 30minutes. When I remove the secondary index, > the mapreduce runs just 10minutes. There are 26,273,544 mutations > total. > > Also with the secondary index, the nodes show very high load 50+ and > iowait 70%+. Without secondary index the load is ~5 and iowait ~10%. > > What may be the problem? > > Regards, > Patrik > >