Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of jayunit100@gmail.com designates
 209.85.160.176 as permitted sender)
References: 
 <CAODudd+JMv1aBe6v0KURNU+uzMo026MgRCKk9cKAz+cpOg8vcw@mail.gmail.com>
 <CAADy7x4uKhfjCLXLdMK_qpZ0x+toxBgqs7kUaosHu1jsFVTJZA@mail.gmail.com>
Mime-Version: 1.0 (1.0)
In-Reply-To: 
 <CAADy7x4uKhfjCLXLdMK_qpZ0x+toxBgqs7kUaosHu1jsFVTJZA@mail.gmail.com>
Content-Type: multipart/alternative;
	boundary=Apple-Mail-3EB120DF-3D3D-4C1C-A9C1-4B45B838C230
Content-Transfer-Encoding: 7bit
Message-Id: <5BCC444E-9267-487A-BB98-D06EE9EB27F4@gmail.com>
Cc: "user@hadoop.apache.org" <user@hadoop.apache.org>
From: Jay Vyas <jayunit100@gmail.com>
Subject: Re: Input splits for sequence file input
Date: Mon, 3 Dec 2012 00:52:56 -0500
To: "user@hadoop.apache.org" <user@hadoop.apache.org>


--Apple-Mail-3EB120DF-3D3D-4C1C-A9C1-4B45B838C230
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This question is fundamentally flawed : it assumes that a mapper will ask fo=
r anything.

The mapper class "run" method reads from a record reader.  The question you r=
eally should ask is :

How does a RecordReader read records across block boundaries?

Jay Vyas=20
http://jayunit100.blogspot.com

On Dec 2, 2012, at 9:08 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

> method createRecordReader will handle the record boundary issue. You can c=
heck the code for details
>=20
> On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <uniquejeff@gmail.com> wrote:
>> Hello,
>>=20
>> I was reading on the relationship between input splits and HDFS blocks an=
d a question came up to me:
>>=20
>> If a logical record crosses HDFS block boundary, let's say block#1 and bl=
ock#2, does the mapper assigned with this input split asks for (1) both bloc=
ks, or (2) block#1 and just the part of block#2 that this logical record ext=
ends to, or (3) block#1 and part of block#2 up to some sync point that cover=
s this particular logical record?  Note the input is sequence file.
>>=20
>> I guess my question really is: does Hadoop operate on a block basis or do=
es it respect some sort of logical structure within a block when it's trying=
 to feed the mappers with input data.
>>=20
>> Cheers
>>=20
>> Jeff
>=20
>=20
>=20
> --=20
> Best Regards
>=20
> Jeff Zhang

--Apple-Mail-3EB120DF-3D3D-4C1C-A9C1-4B45B838C230
Content-Type: text/html;
	charset=utf-8
Content-Transfer-Encoding: 7bit

<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>This question is fundamentally flawed : it assumes that a mapper will ask for anything.</div><div><br></div><div>The mapper class "run" method reads from a record reader. &nbsp;The question you really should ask is :</div><div><br></div><div>How does a RecordReader read records across block boundaries?<br><br>Jay Vyas&nbsp;<div><a href="http://jayunit100.blogspot.com">http://jayunit100.blogspot.com</a></div></div><div><br>On Dec 2, 2012, at 9:08 PM, Jeff Zhang &lt;<a href="mailto:zjffdu@gmail.com">zjffdu@gmail.com</a>&gt; wrote:<br><br></div><blockquote type="cite"><div>method createRecordReader will handle the record boundary issue. You can check the code for details<br><br><div class="gmail_quote">On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <span dir="ltr">&lt;<a href="mailto:uniquejeff@gmail.com" target="_blank">uniquejeff@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<div><br></div><div>I was reading on the relationship between input splits and HDFS blocks and a question came up to me:</div>
<div><br></div><div>If a logical record crosses HDFS block boundary, let's say block#1 and block#2, does the mapper assigned with this input split asks for (1) both blocks, or (2) block#1 and just the part of block#2 that this logical record extends to, or (3) block#1 and part of block#2 up to some sync point that covers this particular logical record? &nbsp;Note the input is sequence file.</div>

<div><br></div><div>I guess my question really is: does Hadoop operate on a block basis or does it respect some sort of logical structure within a block when it's trying to feed the mappers with input data.</div><div>

<br></div><div>Cheers</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Jeff</div><div><br></div>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>Best Regards<br><br>Jeff Zhang<br>
</div></blockquote></body></html>
--Apple-Mail-3EB120DF-3D3D-4C1C-A9C1-4B45B838C230--