Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05A98EC09 for ; Sat, 16 Feb 2013 04:10:56 +0000 (UTC) Received: (qmail 37648 invoked by uid 500); 16 Feb 2013 04:10:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37491 invoked by uid 500); 16 Feb 2013 04:10:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37452 invoked by uid 99); 16 Feb 2013 04:10:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Feb 2013 04:10:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eevans@acunu.com designates 209.85.210.169 as permitted sender) Received: from [209.85.210.169] (HELO mail-ia0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Feb 2013 04:10:44 +0000 Received: by mail-ia0-f169.google.com with SMTP id j5so3920447iaf.14 for ; Fri, 15 Feb 2013 20:10:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acunu.com; s=google; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=z+0qUTLY2AjGRTv3JYqJTLoN6WPZSTfo7pqih/t46mY=; b=L0dHCcXfN8tQP/kUqQFejIBmPlFxGvnT7Mn91ElqwPW6rQjLow8H/FPmwPeM25EJDy 57SizUbvUe20HJ4iPgrnU0hJ5FK4PVJmISCkpbWtCZnBo88tvvO/1PFoXjZURfi1pDHr FIgh9psEXbNRUz2vtLXt8GTQ2k4sCNuOlUVAQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=z+0qUTLY2AjGRTv3JYqJTLoN6WPZSTfo7pqih/t46mY=; b=kx8y/2t/2aOSqTX28LhQAVVSUbA1AqAqJE74/U1ZUVc3bdUSVo77y2236RDFCQ4roX MivhnW5Ej2V0GgyjkpRwcSpuVqlDc7IkqyNG0FeusFcsLY4pVabpH7ioO/N8dF2l4r9r oOgKUFU97ge8pyjwHbI/nGmKNKZ4MINcPnCGv6iSn4hDpDxs8vcSZysmnxJFjq9FvwW4 Nx2nfW+fQzfBXY3LlPxQugIblxGNriGMxrxHvKXDzhnGzNX3zZwxCJ2TXqfeERPs+UdH WYk6RLeykJ8aWtdCPJah/7wSOzYeXdEjZ+ZD4rOh9RhA2lBrw9K8snCQBL62HKXDiFUt qJNA== X-Received: by 10.50.45.230 with SMTP id q6mr3376217igm.39.1360987823139; Fri, 15 Feb 2013 20:10:23 -0800 (PST) MIME-Version: 1.0 Received: by 10.50.97.37 with HTTP; Fri, 15 Feb 2013 20:10:03 -0800 (PST) In-Reply-To: References: From: Eric Evans Date: Fri, 15 Feb 2013 22:10:03 -0600 Message-ID: Subject: Re: virtual nodes + map reduce = too many mappers To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmTAw+zfy6JR+W2oJ8t/OWnf7mpmEXEuoaOYPGwwdzvN9hdnl01kKbBVfB96ygz9SF0QI1R X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo wrote: > Seems like the hadoop Input format should combine the splits that are > on the same node into the same map task, like Hadoop's > CombinedInputFormat can. I am not sure who recommends vnodes as the > default, because this is now the second problem (that I know of) of > this class where vnodes has extra overhead, > https://issues.apache.org/jira/browse/CASSANDRA-5161 > > This seems to be the standard operating practice in c* now, enable > things in the default configuration like new partitioners and newer > features like vnodes, even though they are not heavily tested in the > wild or well understood, then deal with fallout. Except that it is not in fact enabled by default; The default remains 1-token-per-node. That said, the only way that a feature like this will ever be heavily tested in the wild, and well understood, is if it is actually put to use. Speaking only for myself, I am grateful to users like Cem who test new features and report the issues they find. > On Fri, Feb 15, 2013 at 11:52 AM, cem wrote: >> Hi All, >> >> I have just started to use virtual nodes. I set the number of nodes to 256 >> as recommended. >> >> The problem that I have is when I run a mapreduce job it creates node * 256 >> mappers. It creates node * 256 splits. this effects the performance since >> the range queries have a lot of overhead. >> >> Any suggestion to improve the performance? It seems like I need to lower the >> number of virtual nodes. >> >> Best Regards, >> Cem >> >> -- Eric Evans Acunu | http://www.acunu.com | @acunu