Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62D6E18B45 for ; Tue, 12 May 2015 19:58:08 +0000 (UTC) Received: (qmail 58541 invoked by uid 500); 12 May 2015 19:58:02 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 58444 invoked by uid 500); 12 May 2015 19:58:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58417 invoked by uid 99); 12 May 2015 19:58:01 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 May 2015 19:58:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 46D4FC44D7 for ; Tue, 12 May 2015 19:58:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.963 X-Spam-Level: *** X-Spam-Status: No, score=3.963 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H2=-0.038, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id NANs-Ha1aGiR for ; Tue, 12 May 2015 19:57:48 +0000 (UTC) Received: from gateway09.websitewelcome.com (gateway09.websitewelcome.com [69.93.35.26]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id AA2DA40E1C for ; Tue, 12 May 2015 19:57:48 +0000 (UTC) Received: by gateway09.websitewelcome.com (Postfix, from userid 507) id C3C19DDD26B3B; Tue, 12 May 2015 14:57:41 -0500 (CDT) Received: from gator4106.hostgator.com (gator4106.hostgator.com [192.185.4.118]) by gateway09.websitewelcome.com (Postfix) with ESMTP id BCFB9DDD26B10 for ; Tue, 12 May 2015 14:57:41 -0500 (CDT) Received: from [185.22.90.122] (port=2328 helo=[169.254.157.241]) by gator4106.hostgator.com with esmtpsa (UNKNOWN:RC4-SHA:128) (Exim 4.82) (envelope-from ) id 1YsGJB-0001OW-8D for user@hadoop.apache.org; Tue, 12 May 2015 14:57:41 -0500 To: "user@hadoop.apache.org" Message-Id: From: Date: Tue, 12 May 2015 19:57:36 -0000 Mime-Version: 1.0 X-Mailer: Inky (TM) Desktop v2.2.54AF.FB0C ("Clever Cephalopod") Subject: Smaller block size for more intense jobs Content-Type: multipart/alternative; boundary="85ab911f.1431460658.090000@Spaceship" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator4106.hostgator.com X-AntiAbuse: Original Domain - hadoop.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nissatech.com X-BWhitelist: no X-Source-IP: 185.22.90.122 X-Exim-ID: 1YsGJB-0001OW-8D X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: ([169.254.157.241]) [185.22.90.122]:2328 X-Source-Auth: marko.dinic@nissatech.com X-Email-Count: 1 X-Source-Cap: bmlzc2E7bmlzc2E7Z2F0b3I0MTA2Lmhvc3RnYXRvci5jb20= --85ab911f.1431460658.090000@Spaceship Content-Type: text/plain; charset=us-ascii; format=flowed Hello, I'm in doubt should I specify the block size to be smaller than 64MB in case that my mappers need to do intensive computations? I know that it is better to have larger files, since the replication and NameNode as a weak point, but I'm don't have that much data, but the operations that need to be performed on it are intensive. It looks like it's better to have smaller block size (at least until there is more data) so that multiple Mappers get instantiated, so they could share the computations. I'm currently talking about Hadoop 1, not YARN. But a heads up about the same problem with YARN will be appreciated. Thanks, Marko Sent with [inky](http://inky.com?kme=signature) --85ab911f.1431460658.090000@Spaceship Content-Type: text/html; charset=us-ascii

Hello,

 

I'm in doubt should I specify the block size to be smaller than 64MB in case that my mappers need to do intensive computations?

 

I know that it is better to have larger files, since the replication and NameNode as a weak point, but I'm don't have that much data, but the operations that need to be performed on it are intensive.

 

It looks like it's better to have smaller block size (at least until there is more data) so that multiple Mappers get instantiated, so they could share the computations.

 

I'm currently talking about Hadoop 1, not YARN. But a heads up about the same problem with YARN will be appreciated.

 

Thanks,

Marko

 

Sent with inky

--85ab911f.1431460658.090000@Spaceship--