Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BEC6DCBA for ; Wed, 7 Nov 2012 12:47:56 +0000 (UTC) Received: (qmail 1010 invoked by uid 500); 7 Nov 2012 12:47:51 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 696 invoked by uid 500); 7 Nov 2012 12:47:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 686 invoked by uid 99); 7 Nov 2012 12:47:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 12:47:51 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 12:47:43 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so2547118iea.35 for ; Wed, 07 Nov 2012 04:47:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:x-originating-ip:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :x-gm-message-state; bh=qxyghn6cKBkiMwfjUtIxRHR9EK89VyqV/Z/gdvIrLdY=; b=O4WNw1Wa3tiGM3G7VRH+UlflK5jdGmG63JAPA/JibJRDymmlh7hLlV64L44wLy+WM3 0rFEY6G7Vtrz0VLOip/kwx3CkwKmWGAh7lS3+d8KntLyPL8XNRcfE2OrQIm4Ojgx2QbX Bg0VkLWhTCkOQFc9DOswRiktO6hEVGsDyFzFuG2/L7GPD+jPE0Iv8jxA4YyJu6SDTjim ZPhFE/P5Vi7H8D3A2T0/hVZU4E45sHRwkv1915PWOccvR4BcJfgB23IcV0krcCuiH5Td M+fcBUfRc7IkOmfKRl6uIQcsLia11fd96amXZ3Md3BJE1FTjrrSYwo9RTOvVnLNClvWI xxWA== MIME-Version: 1.0 Received: by 10.50.151.172 with SMTP id ur12mr1831617igb.44.1352292442182; Wed, 07 Nov 2012 04:47:22 -0800 (PST) Sender: niels@basj.es Received: by 10.64.7.169 with HTTP; Wed, 7 Nov 2012 04:47:22 -0800 (PST) X-Originating-IP: [212.121.118.65] In-Reply-To: References: Date: Wed, 7 Nov 2012 13:47:22 +0100 X-Google-Sender-Auth: NyOoy_Gr7pXioBUzY-YRF320sbw Message-ID: Subject: Re: Doubts on compressed file From: Niels Basjes To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQno0RQjTqpOfuwKkVnElHJNvJjNA3lRgOGADVU2L8usSEfv4bGFwHSZpuo/3JvqQdbNzdNo X-Virus-Checked: Checked by ClamAV on apache.org Hi, > If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks and > store in HDFS? Yes. > I understand that a single mapper can work with GZip as it reads the entire > file from beginning to end... In that case if the GZip file size is larget > than 128 MB will it get splitted into blocks and stored in HDFS? Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "the entire file" locally. -- Best regards / Met vriendelijke groeten, Niels Basjes