Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FE3BDAB8 for ; Sun, 2 Dec 2012 22:03:36 +0000 (UTC) Received: (qmail 59237 invoked by uid 500); 2 Dec 2012 22:03:31 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 59138 invoked by uid 500); 2 Dec 2012 22:03:31 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 59130 invoked by uid 99); 2 Dec 2012 22:03:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Dec 2012 22:03:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of uniquejeff@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Dec 2012 22:03:24 +0000 Received: by mail-ie0-f176.google.com with SMTP id 13so3483635iea.35 for ; Sun, 02 Dec 2012 14:03:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=GNOWqdIrUGz30YeYVfp+jDotBTeqJZaBC2Xg+a1hdxU=; b=lhi7v9XlU7xrEeyK9emgK/ki7ovsc3w8Sg7qCfkG2K0n4SrGILFKupFKtEBDLmuS7z PC5H04fl6oO9iMOarcCQQrGfaAjF9GIexVkkBe8ZlMKtfbkJeOwB/Jqc6mA9Ef+Io7jg 3jeZCei06ZpmIIeIBQPOV9xqpAv0CyeOL4+xCNHrgaG2NEjo0xt+LH1FmliVI+IlRSku wA6gDxlKtyrT9CQwj5IBjRc74yCM2tn/yBVDJHEC+XVQTbiKP/ul++KiwI8a2YUtXstJ CfQrZzjqUitVpONcLyCiT8svASiRR7Omn9c91DnvxNcGlpIoZlly4NG/R2M7N47cX55c eeFg== MIME-Version: 1.0 Received: by 10.50.209.65 with SMTP id mk1mr4548176igc.8.1354485783134; Sun, 02 Dec 2012 14:03:03 -0800 (PST) Received: by 10.64.9.164 with HTTP; Sun, 2 Dec 2012 14:03:03 -0800 (PST) Date: Sun, 2 Dec 2012 17:03:03 -0500 Message-ID: Subject: Input splits for sequence file input From: Jeff LI To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae934073df500ca04cfe5cb9e X-Virus-Checked: Checked by ClamAV on apache.org --14dae934073df500ca04cfe5cb9e Content-Type: text/plain; charset=ISO-8859-1 Hello, I was reading on the relationship between input splits and HDFS blocks and a question came up to me: If a logical record crosses HDFS block boundary, let's say block#1 and block#2, does the mapper assigned with this input split asks for (1) both blocks, or (2) block#1 and just the part of block#2 that this logical record extends to, or (3) block#1 and part of block#2 up to some sync point that covers this particular logical record? Note the input is sequence file. I guess my question really is: does Hadoop operate on a block basis or does it respect some sort of logical structure within a block when it's trying to feed the mappers with input data. Cheers Jeff --14dae934073df500ca04cfe5cb9e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello,

I was reading on the relationship between input s= plits and HDFS blocks and a question came up to me:

If a logical record crosses HDFS block boundary, let's say block#1 an= d block#2, does the mapper assigned with this input split asks for (1) both= blocks, or (2) block#1 and just the part of block#2 that this logical reco= rd extends to, or (3) block#1 and part of block#2 up to some sync point tha= t covers this particular logical record? =A0Note the input is sequence file= .

I guess my question really is: does Hadoop operate on a= block basis or does it respect some sort of logical structure within a blo= ck when it's trying to feed the mappers with input data.

Cheers

Jeff

--14dae934073df500ca04cfe5cb9e--