Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEB42F461 for ; Fri, 29 Mar 2013 15:46:50 +0000 (UTC) Received: (qmail 94923 invoked by uid 500); 29 Mar 2013 15:46:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94718 invoked by uid 500); 29 Mar 2013 15:46:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94693 invoked by uid 99); 29 Mar 2013 15:46:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 15:46:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 15:46:42 +0000 Received: by mail-lb0-f180.google.com with SMTP id t11so423729lbi.11 for ; Fri, 29 Mar 2013 08:46:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=+YMs00Okld09H+D+V5FCsY6h9p74BMQ6/kTCrHtfm0k=; b=kQj+pOCQf1auoU4oPSmRA1kvDDgX03HeASaLM8E9734BSNT8jDbcI9giNEjhrIk0fM C1KlG0m7YTZMxQSx4XTqGd9VQZROPQ+AYgJUV1yDpcjahg9xcdM/xnOzOH8U+k2KhYy4 wzS9bnieoTcHlfQmELZbJvQ/+AaQyVO9/YAD2QV/5y3zfBrduuNN/SfBaR0bEGFgN2HO Hd1OGu+6vA6DpEkuF4RZysxyNtd1j8ZzaZf3vpajRiYDqwmxjT0wOLqShy0EHOPRpRfn B6LXr3mu61hHE9mGo2cJu/lSjY/pVnWKjvfCnixk2zGrykpcVCklpehX0L8QjvkKiB8H 2QTA== X-Received: by 10.152.28.3 with SMTP id x3mr1392889lag.27.1364571981672; Fri, 29 Mar 2013 08:46:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.38.101 with HTTP; Fri, 29 Mar 2013 08:46:01 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Fri, 29 Mar 2013 10:46:01 -0500 Message-ID: Subject: Re: Vnodes - HUNDRED of MapReduce jobs To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org My point is that if you have over 16MB of data per node, you're going to get thousands of map tasks (that is: hundreds per node) with or without vnodes. On Fri, Mar 29, 2013 at 9:42 AM, Edward Capriolo wrote: > Every map reduce task typically has a minimum Xmx of 256MB memory. See > mapred.child.java.opts... > So if you have a 10 node cluster with 256 vnodes... You will need to spawn > 2,560 map tasks to complete a job. > And a 10 node hadoop cluster with 5 map slotes a node... You have 50 map > slots. > > Wouldnt it be better if the input format spawned 10 map tasks instead of > 2,560? > > > On Fri, Mar 29, 2013 at 10:28 AM, Jonathan Ellis wrote: >> >> I still don't see the hole in the following reasoning: >> >> - Input splits are 64k by default. At this size, map processing time >> dominates job creation. >> - Therefore, if job creation time dominates, you have a toy data set >> (< 64K * 256 vnodes = 16 MB) >> >> Adding complexity to our inputformat to improve performance for this >> niche does not sound like a good idea to me. >> >> On Thu, Mar 28, 2013 at 8:40 AM, cem wrote: >> > Hi Alicia , >> > >> > Cassandra input format creates mappers as many as vnodes. It is a known >> > issue. You need to lower the number of vnodes :( >> > >> > I have a simple solution for that and ready to write a patch. Should I >> > create a ticket about that? I don't know the procedure about that. >> > >> > Regards, >> > Cem >> > >> > >> > On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong >> > wrote: >> >> >> >> Hi All, >> >> >> >> I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for >> >> vnodes. >> >> >> >> When I execute a M/R job .. the console showed HUNDRED of Map tasks. >> >> >> >> May I know, is the normal since is vnodes? If yes, this have slow the >> >> M/R >> >> job to finish/complete. >> >> >> >> >> >> Thanks >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder, http://www.datastax.com >> @spyced > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced